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FIELD OF THE INVENTION 

The invention relates to generally to polynucleotides and the polypeptides encoded 
15 thereby and more particularly to polynucleotides encoding polypeptides that cross one or more 
membranes in eukaryotic cells. 

BACKGROUND OF THE INVENTION 

Eukaryotic cells are subdivided by membranes into multiple, functionally-distinct compartments, 
referred to as organelles. Many biologically important proteins are secreted from the cell after crossing 
20 multiple membrane-bound organelles. These proteins can often be identified by the presence of sequence 
motifs referred to as "sorting signals" in the protein, or in a precursor form of the protein. These sorting 
signals can also aid in targeting the proteins to their appropriate destination. 

One specific type of sorting signal is a signal sequence, which is also referred to as a signal 
peptide or leader sequence. This signal sequence, which can be present as an amino-terminal extension 
25 on a newly synthesized polypeptide. A signal sequence possesses the ability to "target" proteins to an 
organelle known as the endoplasmic reticulum (ER). 

The signal sequence takes part in an array of protein-protein and protein-lipid interactions that 
result in the translocation of a signal sequence-containing polypeptide through a channel within the ER. 
Following translocation, a membrane-bound enzyme, designated signal peptidase, liberates the mature 
30 protein from the signal sequence. 

Secreted and membrane-bound proteins are involved in many biologically diverse 
activities. Examples of known, secreted proteins include, e.g., insulin, interferon, interleukin, 
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transforming growth factor-|3, human growth hormone, erythropoietin, and lymphokine. 
Only a limited number of genes encoding human membrane-bound and secreted proteins have 
been identified. 



Failure to thrive, nutritional edema, and hypoproteinemia with normal sweat electrolytes 
5 of 2 affected male infants reported by Townes et al (J. Pediat. 7 1 : 220-224, 1967), could be 
treated by a protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) 
reported an affected female who also had imperforate anus, a result of a defect in the synthesis of 
the enterokinase which activates proteolytic enzymes produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 

10 member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 
trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 
Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41: 305-310, 1986) isolated cDNA clones for 2 major human 

15 trypsinogen isozymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 15-amino acid signal 
peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. The gene encoding trypsin-1 (TRY1) is also referred to as serine protease-1 

20 (PRSS1). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 
685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of 

25 pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 
T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 
tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 

30 suggesting shared functional or regulatory constraints, as has been postulated for genes in the 
major histocompatibility complex (such as class I, II, and III genes) that share similar long-term 
organizational relationships. The gene of invention is a novel serine protease containing a trypsin 
domain but localized on chromosome 16. 
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SUMMARY OF THE INVENTION 

The invention is based, in part, upon the discovery of novel nucleic acids and secreted 
polypeptides encoded thereby. The nucleic acids and polypeptides are collectively referred to 
herein as "SECP". 

5 Accordingly, in one aspect, the invention includes an isolated nucleic acid that encodes a 

SECP polypeptide, or a fragment, homolog, analog or derivative thereof. For example, the 
nucleic acid can encode a polypeptide at least 85% identical to a polypeptide comprising the 
amino acid sequences of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 
and 57. The nucleic acid can be, e.g., a genomic DNA fragment, cDNA molecule. In some 
10 embodiments, the nucleic acid includes the sequence the invention provides an isolated nucleic 
acid molecule that includes the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. 

Also included within the scope of the invention is a vector containing one or more of the 
nucleic acids described herein, and a cell containing the vectors or nucleic acids described 
15 herein. 

The invention is also directed to host cells transformed with a vector comprising any of 
the nucleic acid molecules described above. 

In another aspect, the invention includes a pharmaceutical composition that includes a 
SECP nucleic acid and a pharmaceutical^ acceptable carrier or diluent. 

20 In a further aspect, the invention includes a substantially purified SECP polypeptide, e.g., 

any of the SECP polypeptides encoded by a SECP nucleic acid, and fragments, homologs, 
analogs, and derivatives thereof. The invention also includes a pharmaceutical composition that 
includes a SECP polypeptide and a pharmaceutical^ acceptable carrier or diluent. 

In a still a further aspect, the invention provides an antibody that binds specifically to a 
25 SECP polypeptide. The antibody can be, e.g., a monoclonal or polyclonal antibody, and 
fragments, homologs, analogs, and derivatives thereof. The invention also includes a 
pharmaceutical composition including SECP antibody and a pharmaceutical^ acceptable carrier 
or diluent. The invention is also directed to isolated antibodies that bind to an epitope on a 
polypeptide encoded by any of the nucleic acid molecules described above. 

30 The invention also includes kits comprising any of the pharmaceutical compositions 

described above. 
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The invention further provides a method for producing a SECP polypeptide by providing 
a cell containing a SECP nucleic acid, e.g., a vector that includes a SECP nucleic acid, and 
culturing the cell under conditions sufficient to express the SECP polypeptide encoded by the 
nucleic acid. The expressed SECP polypeptide is then recovered from the cell. Preferably, the 
5 cell produces little or no endogenous SECP polypeptide. The cell can be, e.g., a prokaryotic cell 
or eukaryotic cell. 

The invention is also directed to methods of identifying a SECP polypeptide or nucleic 
acids in a sample by contacting the sample with a compound that specifically binds to the 
polypeptide or nucleic acid, and detecting complex formation, if present. 

10 The invention further provides methods of identifying a compound that modulates the 

activity of a SECP polypeptide by contacting SECP polypeptide with a compound and 
determining whether the SECP polypeptide activity is modified. 

The invention is also directed to compounds that modulate SECP polypeptide activity 
identified by contacting a SECP polypeptide with the compound and determining whether the 
15 compound modifies activity of the SECP polypeptide, binds to the SECP polypeptide, or binds to 
a nucleic acid molecule encoding a SECP polypeptide. 

In a another aspect, the invention provides a method of determining the presence of or 
predisposition of a SECP-associated disorder in a subject. The method includes providing a 
sample from the subject and measuring the amount of SECP polypeptide in the subject sample. 

20 The amount of SECP polypeptide in the subject sample is then compared to the amount of SECP 
polypeptide in a control sample. An alteration in the amount of SECP polypeptide in the subject 
protein sample relative to the amount of SECP polypeptide in the control protein sample 
indicates the subject has a tissue proliferation-associated condition. A control sample is 
preferably taken from a matched individual, i.e., an individual of similar age, sex, or other 

25 general condition but who is not suspected of having a tissue proliferation-associated condition. 
Alternatively, the control sample may be taken from the subject at a time when the subject is not 
suspected of having a tissue proliferation-associated disorder. In some embodiments, the SECP 
is detected using a SECP antibody. 

In a further aspect, the invention provides a method of determining the presence of or 
30 predisposition of a SECP-associated disorder in a subject. The method includes providing a 

nucleic acid sample {e.g., RNA or DNA, or both) from the subject and measuring the amount of 
the SECP nucleic acid in the subject nucleic acid sample. The amount of SECP nucleic acid 
sample in the subject nucleic acid is then compared to the amount of a SECP nucleic acid in a 



control sample. An alteration in the amount of SECP nucleic acid in the sample relative to the 
amount of SECP in the control sample indicates the subject has a tissue proliferation-associated 
disorder. 

In a still further aspect, the invention provides method of treating or preventing or 
5 delaying a SECP-associated disorder. The method includes administering to a subject in which 
such treatment or prevention or delay is desired a SECP nucleic acid, a SECP polypeptide, or a 
SECP antibody in an amount sufficient to treat, prevent, or delay a tissue proliferation-associated 
disorder in the subject. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
10 meaning as commonly understood by one of ordinary skill in the art to which this invention 

belongs. Although methods and materials similar or equivalent to those described herein can be 
used in the practice or testing of the invention, suitable methods and materials are described 
below. All publications, patent applications, patents, and other references mentioned herein are 
incorporated by reference in their entirety. In the case of conflict, the present Specification, 
15 including definitions, will control. In addition, the materials, methods, and examples are 
illustrative only and not intended to be limiting. 

Other features and advantages of the invention will be apparent from the following 
detailed description and claims. 

BRIEF DESCRIPTION OF THE FIGURES 

20 FIG. 1 is a representation of a SECP 1 nucleic acid sequence (SEQ ID NO: 1) according 

to the invention, along with an amino acid sequence (SEQ ID NO:2) encoded by the nucleic acid 
sequence. 

FIG. 2 is a representation of a SECP 2 nucleic acid sequence (SEQ ID NO:3) according 
to the invention, along with an amino acid sequence (SEQ ID NO:4) encoded by the nucleic acid 
25 sequence. 

FIG. 3 is a representation of a SECP 3 nucleic acid sequence (SEQ ID NO:5) according 
to the invention, along with an amino acid sequence (SEQ ID NO:6) encoded by the nucleic acid 
sequence. 

FIG. 4 is a representation of a SECP 4 nucleic acid sequence (SEQ ID NO:7) according 
30 to the invention, along with an amino acid sequence (SEQ ID NO: 8) encoded by the nucleic acid 
sequence. 
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FIG. 5 is a representation of a SECP 5 nucleic acid sequence (SEQ ID NO:9) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 10) encoded by the nucleic 
acid sequence. 

FIG. 6 is a representation of a SECP 6 nucleic acid sequence (SEQ ID NO: 1 1) according 
5 to the invention, along with an amino acid sequence (SEQ ID NO:12) encoded by the nucleic 
acid sequence. 

FIG. 7 is a representation of a SECP 7 nucleic acid sequence (SEQ ID NO: 13) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 14) encoded by the nucleic 
acid sequence. 

10 FIG. 8 is a representation of a SECP 8 nucleic acid sequence (SEQ ID NO: 15) according 

to the invention, along with an amino acid sequence (SEQ ID NO: 16) encoded by the nucleic 
acid sequence. 

FIG. 9 is a representation of a SECP 9 nucleic acid sequence (SEQ ID NO: 17) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 18) encoded by the nucleic 
15 acid sequence. 

FIG. 10 is a representation of an alignment of the proteins encoded by clones 
11618130.0.27 (SEQ ID NO:4) and 11618130.0.184 (SEQ ID NO:16). 

FIG. 1 1 is a representation of an alignment of the proteins encoded by clones 
14578444.0.143 (SECP4; SEQ ID NO:8) and 14578444.0.47 (SECP 5; SEQ ID NO: 10). 

20 FIG. 12 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 

polynucleotide containing sequences encoded by clone 11618130. 

FIG. 13 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 
polynucleotide containing sequence encoded by clone 16406477. 

FIG. 14 is a representation of a real-time expression analysis of the clones of the 
25 invention. 

FIG. 15 is a representation of a SECP 10 nucleic acid sequence (SEQ ID NO:40) 
according to the invention, along with an amino acid sequence (SEQ ID NO:41) encoded by the 
nucleic acid sequence. 

FIG. 16 is a representation of a SECP 1 1 nucleic acid sequence (SEQ ID NO:42) 
30 according to the invention, along with an amino acid sequence (SEQ ID NO:43) encoded by the 
nucleic acid sequence. 
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FIG. 17 is a representation of a SECP 12 nucleic acid sequence (SEQ ID NO:44) 
according to the invention, along with an amino acid sequence (SEQ ID NO:45) encoded 
nucleic acid sequence. 



by the 



FIG. 18 is a representation of a SECP 13 nucleic acid sequence (SEQ ID NO:46) 
5 according to the invention, along with an amino acid sequence (SEQ ID NO:47) encoded by the 
nucleic acid sequence. 

FIG. 19 is a representation of a SECP 14 nucleic acid sequence (SEQ ID NO:48) 
according to the invention, along with an amino acid sequence (SEQ ID NO:49) encoded by the 
nucleic acid sequence. 

10 FIG. 20 is a representation of a SECP 15 nucleic acid sequence (SEQ ID NO:50) 

according to the invention, along with an amino acid sequence (SEQ ID NO:51) encoded by the 
nucleic acid sequence. 

FIG. 21 is a representation of a SECP 16 nucleic acid sequence (SEQ ID NO:52) 
according to the invention, along with an amino acid sequence (SEQ ID NO:53) encoded by the 
15 nucleic acid sequence. 

FIG. 22 is a representation of a SECP 17 nucleic acid sequence (SEQ ID NO:54) 
according to the invention, along with an amino acid sequence (SEQ ID NO:55) encoded by the 
nucleic acid sequence. 

FIG. 23 is a representation of a SECP 18 nucleic acid sequence (SEQ ID NO:56) 
20 according to the invention, along with an amino acid sequence (SEQ ID NO:57) encoded by the 
nucleic acid sequence. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention provides novel polynucleotides and the polypeptides encoded thereby. 
Included in the invention are ten novel nucleic acid sequences and their encoded polypeptides. 
25 These sequences are collectively referred to as "SECP nucleic acids" or "SECP polynucleotides" 
and the corresponding encoded polypeptide is referred to as a "SECP polypeptide" or "SECP 
protein". For example, a SECP nucleic acid according to the invention is a nucleic acid 
including a SECP nucleic acid, and a SECP polypeptide according to the invention is a 
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polypeptide that includes the amino acid sequence of a SECP polypeptide. Unless indicated 
otherwise, "SECP" is meant to refer to any of the novel sequences disclosed herein. Each of the 
nucleic acid and amino acid sequences have been assigned a unique SECP Identification 
Number, with designations SECP1 through SECP10. 

TABLE 1 provides a cross-refeience to the assigned SECP Number, Clone or Probe 
Identification Number, and Sequence Identification Number (SEQ ID NO:) for both the nucleic 
acid and encoded polypeptides of SECP1-14. 



TABLE 1 



CLONE/PROBE 


FIGURE 


SEQ ID NO: 
(Nucleic Acid) 


SEQ ID NO: 
(Polypeptide) 




1 

1 


1 


2 


raiipH pgsori7-03 




3 


4 


1 1 6QfiQ0^-0-47 




5 


6 


1AZ1QAAA 0 \A'X 


4 


7 


8 


i 4S7R444 0 47 


5 


9 


10 


14QQRQ0^ 0 6S 

1 T7707UJ .VJ.\JU 


6 


11 


12 


1 6406477 0 206 


7 


13 


14 


1 161 81 30 0 184 


8 


15 


16 


21637262 0 64 


9 


17 


18 


CGI 063 18-01 


15 


40 


41 


CG508 17-04 


16 


42 


43 


CG508 17-05 


17 


44 


45 


CG508 17-06 


18 


46 


47 


CG5 1099-03 


19 


48 


49 


CG57051-04 


20 


50 


51 


CG57051-O5 


21 


52 


53 


CG57051-02 


22 


54 


55 


CG57051-03 


23 


56 


57 


11618130 Forward 




19 




11618130 Reverse 
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PSec-V5-His Forward 




21 




PSec-V5-His Reverse 




22 




16406477 Forward 




23 




16406477 Reverse 




24 




Ag383 (F) 




25 




Ag383(R) 




26 




Ag383 (P) 




27 




Ag53 (F) 




28 




Ag53 (R) 




29 




Ag53(P) 




30 




Ag 127 (F) 




31 




Ag 127 (R) 




32 




Ag 127 (P) 




33 




Ab 5(F) 




34 




Ab 5(R) 




35 




Ab 5(P) 




36 




Ag 815(F) 




37 
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Ag815(R) 




38 




Ag815(P) 




39 





Nucleic acid sequences and polypeptide sequences for SECP nucleic acids and 
polypeptides, as disclosed herein, are provided in the following section of the Specification. 

SECP nucleic acids, and their encoded polypeptides, according to the invention are useful 
in a variety of applications and contexts. For example, various SECP nucleic acids and 
5 polypeptides according to the invention are useful, inter alia, as novel members of the protein 
families according to the presence of domains and sequence relatedness to previously described 
proteins. 

SECP nucleic acids and polypeptides according to the invention can also be used to 
identify cell types based on the presence or absence of various SECP nucleic acids according to 
10 the invention. Additional utilities for SECP nucleic acids and polypeptides are discussed below. 

SECP1 

A SECP1 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:l) and encoded polypeptide sequence (SEQ ID NO:2) of clone 
21433858. FIG. 1 illustrates the nucleic acid and amino acid sequences, as well as the alignment 
15 between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO: 1) of 6373 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 1588 amino acid 
residues (SEQ ID NO:2) with a predicted molecular weight of 178042.1 Daltons. The start 
codon is located at nucleotides 235-237 and the stop codon is located at nucleotides 4999-5001. 
20 The protein encoded by clone 21433858 is predicted by the PSORT program to localize in the 
plasma membrane with a certainty of 0.7300. The program SignalP predicts that there is a signal 
peptide with the most probable cleavage site located between residues 23 and 24, in the sequence 
CMG-DE. 

Real-time gene expression analysis was performed on SECP1 (clone 21433858). The 
25 results demonstrate that RNA sequences with homology to clone 21433858 are detected in 

various cell types. The relative abundance of RNA homologous to clone 21433858 is shown in 
FIG. 14 (see also Examples, below). Cell types endothelial cells (treated and untreated), 
pancreas, adipose, adrenal gland, thyroid, mammary gland, myometrium, uterus, placenta, 
prostate, testis, and in neoplastic cells derived from ovarian carcinoma OVCAR-3, ovarian 
30 carcinoma OVCAR-5, ovarian carcinoma OVCAR-8, ovarian carcinoma IGROV-1, ovarian 
carcinoma (ascites) SK-OV-3, breast carcinoma BT-549, prostate carcinoma (bone metastases) 



PC-3, Melanoma M14, and melanoma (met) SK-MEL-5. Accordingly, SECP1 nucleic acids 
according to the invention can be used to identify one or more of these cell types. The presence 
of RNA sequences homologous to a SECP1 nucleic in a sample indicates that the sample 
contains one or more of the above-cell types. 
5 A search of sequence databases using BLASTX reveals that residues 299-1588 of the 

polypeptide encoded clone 21433858 are 100% identical to the 1290 residue human KIAA0960 
protein (ACC: SPTREMBL-ACC:Q9UPZ6). In addition, the protein of clone 21433858 has 542 
of 543 residues (99%) identical to, and 543 of 543 residues (100%) positive with, the 543 residue 
fragment of a human hypothetical protein (SPTREMBL-ACC:O60407). 

10 The proteins of the invention encoded by clone 21433858 include the protein disclosed as 

being encoded by the ORF described herein, as well as any mature protein arising therefrom as a 
result of post-translational modifications. Thus, the proteins of the invention encompass both a 
precursor and any active forms of the clone 21433858 protein. 

SECP2 

15 a SECP2 nucleic acid and polypeptide according to the invention includes a nucleic acid 

sequence (SEQ ID NO:3) and an encoded polypeptide sequence (SEQ ID NO:4) of clone 
1 1618130.0.27, also called CG50817-03. FIG. 2 illustrates the nucleic acid sequence and amino 
acid sequence, as well as the alignment between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO:3) of 1894 nucleotides. The 
20 nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 267 amino 
acid residues with a predicted molecular weight of 28043 Daltons. The start codon is at 
nucleotides 732-734 and the stop codon is at nucleotides 1534-1536. The protein encoded by 
clone 1 1618130.0.27 is predicted by the PSORT program to localize in the microbody 
(peroxisome) with a certainty of 0.5035. The program SignalP predicts that there is no signal 
25 peptide in the encoded polypeptide. 

A search of the sequence databases using BLAST P and BLASTX reveals that clone 
1 1618130.0.27 has 330 of 333 residues (99%) identical to and positive with a 571 residue human 
protein termed PR0351 (PCT Publication W09946281-A2 published September 16, 1999). In 
addition, it was found to have 83 of 250 residues (33%) identical to, and 1 19 of 250 residues 
30 (47%) positive with the 343 residue human prostasin precursor (EC 3.4.21.-) (SWISSPROT- 
ACC:Q16651). 
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The proteins of the invention encoded by clone 1 1618130.0.27 includes the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modification. Thus, the protein of the invention 
encompasses both a precursor and any active forms of the 1 1618130.0.27 protein. 

5 SECP3 

A SECP3 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:5) and encoded polypeptide sequence (SEQ ID NO:6) of clone 
1 1696905-0-47. FIG. 3 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

10 Clone 1 1696905-0-47 was obtained from fetal brain. In addition, RNA sequences were 

also found to be present in tissues including, uterus, pregnant and non-pregnant uterus, ovarian 
tumor, placenta, bone marrow, hippocampus, synovial membrane, fetal heart, fetal lung, pineal 
gland and melanocytes. This clone includes a nucleotide sequence of 1855 bp (SEQ ID NO:5). 
The nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 405 

15 amino acid residues (SEQ ID NO:6) with a predicted molecular weight of 44750 Daltons. The 
start codon is located at nucleotides 154-156 and the stop codon is located at nucleotides 1369- 
1371. The protein encoded by clone 1 1696905-0-47 is predicted by the PSORT program to 
localize extracellularly with a certainty of 0.7332. The program SignalP predicts that there is a 
signal peptide with the most probable cleavage site located between residues 25 and 26, in the 

20 sequence AQG-GP. 

Real-time gene expression analysis was performed on SECP3 (clone 11696905-0-47). 
The results demonstrate that RNA sequences homologous to clone 1 1696905-0-47 are detected 
in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, heart, skeletal 
muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and testis, and in 
25 neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, and melanoma 
SK-MEL-28. 

Accordingly, SECP3 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP3 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

30 A search of the sequence databases using BLASTX reveals that clone 1 1696905-0-47 has 

403 of 405 residues (99%) identical to, and 404 of 405 residues (99%) positive with, the 405 

residue human angiopoietin-related protein (SPTREMBL-ACC:Q9Y5B3). Angiopoietin 

homologues are useful to stimulate cell growth and tissue development. The polypeptides of 

11 



1: 



clone 1 1696905-0-47 tend to be found as multimeric proteins (see Example 7) and are believed 
to have angiogenic or hematopoietic activity. They can thus be used in assays for angiogenic 
activity, as well as used therapeutically to stimulate restoration of vascular structure in various 
tissues. Examples of such uses include, but are not limited to, treatment of full-thickness skin 
5 wounds, including venous stasis ulcers and other chronic, non-healing wounds, as well as 

fracture repair, skin grafting, reconstructive surgery, and establishment of vascular networks in 
transplanted cells and tissues. 

The proteins of the invention encoded by clone 11696905-0-47 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
10 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 1 1696905-0-47 protein. 

SECP4 

A SECP4 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:7) and encoded polypeptide sequence (SEQ ID NO:8) of 
15 14578444.0.143. FIG. 4 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 14578444.0.143 was obtained from fetal brain. This clone includes a nucleotide 
sequence (SEQ ID NO:7) of 3026 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of 776 amino acid residues (SEQ ID NO:8) with a predicted 
20 molecular weight of 86220.8 Daltons. The start codon is located at nucleotides 55-57 and the 
stop codon is located at nucleotides 2384-2386. The protein encoded by clone 14578444.0.143 
is predicted by the PSORT program to localize in the endoplasmic reticulum (membrane) with a 
certainty of 0.8200. The program SignalP predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

25 A search of the sequence databases using BLASTX reveals that clone 14578444.0.143 

has 655 of 757 residues (86%) identical to, and 702 of 757 residues (92%) positive with, the 956 
residue murine matrilin-2 precursor protein (SWISSPROT-ACC:O08746), extending over 
residues 1-754 of the reference protein. Additional similarities are found with lower identities in 
residues 649-837 of the murine protein. Additionally, the search shows that there is a lower 

30 degree of similarity to murine matrilin-4 precursor. The protein of clone 14578444.0.143 also 
has 595 of 606 residues (98%) identical to, and 598 of 606 residues (98%) positive with, the 632 
residue human matrilin-3 (PCT publication WO9904002-A1). 
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The matrilin proteins and polynucleotides can be used for treating a variety of 
developmental disorders (e.g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 
can serve as targets for antagonists that should be of use in treating diseases related to abnormal 
vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 
5 glucose-galactose malabsorption syndrome, hypercholesterolemia, diabetes mellitus, diabetes 
insipidus, hyper- and hypoglycemia, Graves disease, goiter, Cushing's disease, Addison's 
disease, gastrointestinal disorders including ulcerative colitis, gastric and duodenal ulcers, and 
other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 
including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 

10 glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 

rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 
lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 
fungal, helminth, protozoal infections, a neoplastic disorder (e.g., adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 

15 disorder, (e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 
asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 

The proteins of the invention encoded by clone 14578444.0.143 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
20 encompass both a precursor and any active forms of the proteins encoded by clone 
14578444.0.143 (SECP4). 

SECP5 

A SECP5 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:9) and encoded polypeptide sequence (SEQ ID NO: 10) of clone 
25 14578444.0.47. FIG. 5 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

Clone 14578444.0.47 was obtained from fetal brain. This clone includes a nucleotide 
sequence (SEQ ID NO:9) of 3447 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of 959 amino acid residues (SEQ ID NO: 10) with a predicted 
30 molecular weight of 107144 Daltons. The start codon is located at nucleotides 55-57 and the 
stop codon is located at nucleotides 2933-2935. The protein encoded by clone 14578444.0.47 is 
predicted by the PSORT program to localize to the endoplasmic reticulum (membrane) with a 
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certainty of 0.8200. The program SignalP predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

A search of the sequence databases using BLASTX reveals that clone 14578444.0.47 has 
829 of 959 residues (86%) identical to, and 887 of 959 residues (92%) positive with, the 956 
5 residue murine matrilin-2 precursor protein (ACC: SWISSPROT-ACC:O08746). The protein 
encoded by clone 14578444.0.47 also has 594 of 606 residues (98%) identical to, and 597 of 606 
residues (98%) positive with, the 632 residue human matrilin-3 (PCT publication WO9904002). 
In addition, the protein encoded by clone 14578444.0.47 also has 616 of 678 residues (90%) 
identical to, and 632 of 678 residues (93%) positive with the 915 residue human protein PR0219 
10 (PCT publication W09914328-A2). 

The proteins encoded by clones 14578444.0.143 (SECP4) and 14578444.0.47 (SECP5) 
are compared in an amino acid residue alignment shown in FIG. 1 1 . It can be seen that the main 
portion of the two proteins starting with their amino-termini are virtually identical, and that short 
sequences in each corresponding to the carboxyl-terminal sequence of the shorter protein, clone 
15 14578444.0.143, differ from one another. Furthermore, clone 14578444.0.47 has an extended 
carboxyl-terminal sequence that is missing in clone 14578444.0.143. Therefore, clones 
14578444.0.143 (SECP4) and 14578444.0.47 (SECP5) are apparently related to one another as 
splice variants, with respect to their sequences at the carboxyl-terminal ends. 

The matrilin proteins and polynucleotides can be used for treating a variety of 

20 developmental disorders (e.g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 

can serve as targets for antagonists that should be of use in treating diseases related to abnormal 

vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 

glucose-galactose malabsorption syndrome, hypercholesterolaemia, diabetes mellitus, diabetes 

insipidus, hyper- and hypoglycemia, Graves disease, goiter, Cushing's disease, Addison's 

25 disease, gastrointestinal disorders including ulcerative colitis, gastric and duodenal ulcers, and 

other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 

including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 

glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 

rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 

30 lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 

fungal, helminth, protozoal infections, a neoplastic disorder (e.g., adenocarcinoma, leukemia, 

lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 

disorder, (e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 

asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 
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The proteins of the invention encoded by clone 14578444.0.47 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the proteins encoded by clone 
5 14578444.0.47 (SECP5). 

SECP6 

A SECP6 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:l 1) and encoded polypeptide sequence (SEQ ID NO: 12) of clone 
14998905.0.65. FIG. 6 illustrates the nucleic acid sequence and amino acid sequence, as well as 
10 the alignment between these two sequences. 

Clone 14998905.0.65 was obtained from lymphoid tissue, in particular, from the lymph 
node. This clone includes a nucleotide sequence (SEQ ID NO: 11) of 967 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 245 amino acid 
residues (SEQ ID NO: 12) with a predicted molecular weight of 27327.2 Daltons. The start 

15 codon is located at nucleotides 166-168 and the stop codon is located at nucleotides 902-904. 
The protein encoded by clone 14998905.0.65 is predicted by the PSORT program to localize in 
the microbody (peroxisome) with a certainty of 0.7480. PSORT predicts that there is no amino- 
terminal signal sequence. Conversely, the program SignalP predicts that there is a signal peptide 
with the most probable cleavage site located between residues 20 and 21, in the sequence GIG- 

20 AE. 

A search of the sequence databases using BLASTX reveals that clone 14998905.0.65 has 
204 of 226 residues (90%) identical to, and 214 of 226 residues (94%) positive with, the 834 
residue murine semaphorin 4C precursor protein (SWISSPROT-ACC:Q64151). Semaphorin 4C 
is indicated as being a Type I membrane protein widely expressed in the nervous system during 
25 development. In addition, it contains one immunoglobulin-like C2-type domain. The protein 
encoded by clone 14998905.0.65 also has similarities to mouse CD100 antigen (PCT publication 
W09717368-A1) and to human semaphorin (JP10155490-A). 

The proteins of the invention encoded by clone 14998905.0.65 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
30 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 14998905.0.65 protein. 
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SECP7 

A SECP7 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ED NO: 13) and encoded polypeptide sequence (SEQ ED NO: 14) of clone 
16406477.0.206. FIG. 7 illustrates the nucleic acid sequence and amino acid sequence, as well 
5 as the alignment between these two sequences. 

Clone 16406477.0.206 was obtained from testis. In addition, sequences of clone 
16406477.0.206 were also found in an RNA pool derived from adrenal gland, mammary gland, 
prostate gland, testis, uterus, bone marrow, melanoma, pituitary gland, thyroid gland and spleen. 
This clone includes a nucleotide sequence (SEQ ED NO: 13) comprising of 1359 bp with an open 

10 reading frame (ORF) encoding a polypeptide of 385 amino acid residues (SEQ ID NO: 14) with a 
predicted molecular weight of 43087.3 Daltons. The start codon is located at nucleotides 45-47 
and the stop codon is located at nucleotides 1201-1203. The piptein encoded by clone 
16406477.0.206 is predicted by the PSORT program to localize extracellularly with a certainty 
of 0.5804 and to have a cleavable amino-terminal signal sequence. The program SignalP 

15 predicts that there is a signal peptide with the most probable cleavage site located between 
residues 39 and 40, in the sequence CWG-AG. 

Real-time expression analysis was performed on SECP7 (clone 16406477.0.206). The 
results demonstrate that RNA homologous to this clone is found in multiple cell and tissue types. 
These cells and tissues include brain, mammary gland, and testis, and in neoplastic cells derived 

20 from ovarian carcinoma OVCAR-3, ovarian carcinoma OVCAR-5, ovarian carcinoma OVCAR- 
8, ovarian carcinoma IGROV-1, breast carcinoma (pleural effusion) T47D, breast carcinoma BT- 
549, melanoma M14. Real-time gene expression analysis was performed on SECP3 (clone 
1 1696905-0-47). The results demonstrate that RNA sequences homologous to clone 1 1696905- 
0-47 are detected in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, 

25 heart, skeletal muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and 
testis, and in neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, 
and melanoma SK-MEL-28. 

Accordingly, SECP7 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP7 nucleic in 
30 a sample indicates that the sample contains one or more of the above-cell types. 

A search of the sequence databases using BLASTX reveals that clone 16406477.0.206 is 
100% identical to a human testis-specific protein TSP50 (SPTREMBL-ACC:Q9UI38) with a 
trypsin/chymotrypsin-like domain. In addition, the protein encoded by clone 16406477.0.206 
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has low similarity to the 343 residue human prostasin precursor (EC 3.4.21.-) (SWISSPROT 
ACC:Q16651). 

The proteins of the invention encoded by clone 16406477.0.206 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
5 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 16406477.0.206 protein. 

SECP8 

A SECP8 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 15) and encoded polypeptide sequence (SEQ ID NO: 16) of clone 
10 1 1618130.0.184. FIG. 8 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 1 1618130.0.184 includes a nucleotide sequence (SEQ ID NO:15) of 1445 bp. The 
nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 198 amino 
acid residues (SEQ ID NO: 16) with a predicted molecular weight of 20659 Daltons. The start 
15 codon is located at nucleotides 732-734 and the stop codon is located at nucleotides 1326-1328. 
The protein encoded by clone 1 1618130.0.184 is predicted by the PSORT program to localize in 
the cytoplasm. The program SignalP predicts that there is no signal peptide. 

Clones 1 1618130.0.184 (SECP8) and 1 1618130.0.27 (SECP2) resemble each other in 
that they are identical over most of their common sequences, and differ only at the carboxyl- 
20 terminal end. In addition, clone 1 1618130.0.27 extends further at the carboxyl-terminal end than 
does clone 11618130.0.184. An alignment of clones 11618130.0.27 and 11618130.0.184 is 
shown in FIG. 10. 

The proteins of the invention encoded by clone 11618130.0.184 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
25 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the 1 1618130.0.184 protein. 

SECP9 

A SECP9 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 17) and encoded polypeptide sequence (SEQ ID NO: 18) of clone 
30 21637262.0.64. FIG. 9 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 
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Clone 21637262.0.64 was obtained from salivary gland. This clone includes a nucleotide 
sequence (SEQ ID NO: 17) of 1600 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of435 amino acid residues (SEQ ED NO: 18) with a predicted 
molecular weight of 47162.5 Daltons. The start codon is located at nucleotides 51-53 and the 
5 stop codon is located at nucleotides 1356-1358. The protein encoded by clone 21637262.0.64 is 
predicted by the PSORT program to localize in the cytoplasm with a certainty of 0.4500. The 
program PSORT and program SignalP predict that the protein appears to have no amino-terminal 
signal sequence. 

Real-time expression analysis was performed on SECP9 (clone 21637262.0.64). The 
10 results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 
types. The relative amounts of RNA in various cell types are shown in FIG. 14 (see also the 
Examples, below). The cells include myometrium, placenta, uterus, prostate, and testis, and 
neoplastic cells derived from breast carcinoma (pleural effusion) T47D, breast carcinoma 
(pleural effusion) MDA-MB-231, breast carcinoma BT-549, ovarian carcinoma OVCAR-3, 
15 ovarian carcinoma OVCAR-5, prostate carcinoma (bone metastases) PC-3, melanoma M14, and 
melanoma LOX IMVL 

Accordingly, SECP9 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP9 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

20 A search of the sequence databases using BLASTX reveals that clone 21637262.0.64 has 

23 of 420 residues (29%) identical to, and 201 of 420 residues (47%) positive with, the 1 130 
residue murine protein repetin (SWISSPROT-ACC:P97347). Repetin is a member of the "fused 
gene" subgroup within the S100 gene family that is an epidermal differentiation protein. 

The proteins of the invention encoded by clone 21637262.0.64 include the protein 
25 disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 21637262.0.64 protein. 

SECP10 

A SECP10 nucleic acid and polypeptide according to the invention includes the nucleic 

30 acid sequence (SEQ ID NO:40 and encoded polypeptide sequence (SEQ ID NO:41) of clone 

CG106318. FIG. 15 illustrates the nucleic acid sequence and amino acid sequences. This clone 

includes a nucleotide sequence (SEQ ID NO:40) of 4810 bp. The nucleotide sequence includes 

an open reading frame (ORF) encoding a polypeptide of 1588 amino acid residues (SEQ ID 
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N0:41). The start codon is located at nucleotides 18-21 and the stop codon is located at 
nucleotides 4782-4785. The protein encoded by clone CG106318-01 is predicted by the PSORT 
program to localize in the nucleus with a certainty of 0.3500. The program PSORT and program 
SignalP predict that the protein appears to have no amino-terminal signal sequence. 

5 Real-time expression analysis was performed on SECP10 (clone CG106318). The results 

demonstrate that RNA homologous to this clone is present in multiple tissue and cell types. 

Accordingly, SECP10 nucleic acids according to the invention can be used to identify 
one or more of these tissue types. The presence of RNA sequences homologous to a SECP10 
nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

10 A search of the sequence databases using BLASTX reveals that clone CG1063 18 has 

1587 out of 1588 (99.9%) of its residues identical to a human protein utilized in the treatment of 
central nervous system disorders (AAM39295 to HYSEQ INC.). 

The proteins of the invention encoded by clone CG106318-01 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
15 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone CG1063 18-01 protein. 

PSORT — Prediction of Protein Translocation Sites version 5.8 
Results Summary: 

plasma membrane Certainty=0. 7000 (Affirmative) < suco 

20 nucleus Certainty=0. 3500 (Affirmative) < suco 

microbody (peroxisome) Certainty=0. 3000 (Affirmative) < suco 

endoplasmic reticulum (membrane) Certainty=0. 2000 (Affirmative) < suco 

PFAM Domain Analysis 

25 Query: 106318-01 

Scores for sequence family classification (score includes all domains): 

ModeJ Description Score Rvalue N 

30 tsp 1 Thrombospondin type 1 domain 169.5 5.4e-47 11 

toxin Snake toxin -16.1 1.3 1 

DUF18 Domain of unknown function DUF18 -55.9 7.8 1 

Keratin B2 Keratin, high sulfur B2 protein -81.1 6.6 1 

35 

Sequences producing High-scoring Segment Pairs: Score P(N) N 

gb:GENBANK-lD: AX079870l acc:AX079870.1 Sequence 1 from Pat 24050 0.0 1 

gb:GENBANKHD:AB023177|acc:AB023177.1 Homo sapiens mRNA f.... 19495 0.0 1 

40 gb:GENBANK-ID:AB051466|acc:AB051466.1 Homo sapiens mRNA f 361 1 5.3e-269 6 

pb:GENBANK-ID: AB006087l acc:AB006087.1 Danio rerio mRNA fo 272 0.16 1 

gb:GENBANK-ID:AF111298]acc:AF1 1 1298.1 HIV-1 isolate eur-0 185 0.998 1 

BLASTP: (1588 letters) 

45 

Database: Non-Redundant Composite Protein 

704,847 sequences; 219,724,008 total letters. 
Searching....10....20....30....40....50....60....70....80....90....100%done 

50 Smallest 
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Sum 

Sequences producing High-scoring Segment Pairs: 



High Probability 




Score 


P(N) 


N 


... 8965 


0.0 


1 


... 7298 


0.0 


1 


. 3983 


0.0 


1 



ptnr.SPTREMBL-ACC: Q9UPZ6 KIAA0960 PROTEIN - Homo sapiens . 
ptnnSPTREMBL-ACC:Q9C0J4 KIAA1679 PROTEIN - Homo sapiens ... 
ptnrSPTREMBL-ACC:O60407 HYPOTHETICAL PROTEIN - Homo sapi...3026 3.1e-315 1 

TABLE 2. BLASTN VERSUS GENBANK COMPOSITE 

Sequences producing High-scoring Segment Pairs: Score P(N) N 

gb:GENBANK>ID: AX079870| acc:AX079870. 1 Sequence 1 from Pat 24050 0.0 1 

gb:GENBANK-ID: AB023177| acc:AB023177.1 Homo sapiens mRNA f ... 19495 0.0 1 

gb:GENB ANK-ID: AB05 1466| acc: AB05 1466. 1 Homo sapiens mRNA f. 361 1 5.3e-269 6 

gb:GENB ANK-ID: AB006087|acc: AB006087. 1 Danio rerio mRNA fo 272 0. 1 6 1 

gb:GENBANK-ID: AF111298 1acc:AFl 11298.1 HIV-1 isolate eur-0 185 0.998 1 



>gb : GENBANK- ID :AX079870 | acc;AX07 987 0.1 Sequence 1 from Patent WO01 05971 - Homo 
sapiens, 6373 bp. (SEQ ID NO:58) 
Length = 6373 

Plus Strand HSPs: 



Score = 24050 (3608.5 bits), Expect =0.0, 
Identities = 4810/4810 (100%), Positives = 



P = 0.0 
4810/4810 



Query: 


1 


Sbjct: 


218 


Query: 


61 


Sbjct: 


278 


Query: 


121 


Sbjct: 


338 


Query: 


181 


Sbjct: 


398 


Query: 


241 


Sbjct: 


458 


Query: 


301 


Sbjct: 


518 


Query: 


361 


Sbjct: 


578 


Query: 


421 


Sbjct: 


638 


Query: 


481 


Sbjct: 


698 


Query: 


541 


Sbjct: 


758 


Query: 


601 


Sbjct: 


818 


Query: 


661 



(100%), Strand = Plus / Plus 



60 



GTCCATGGGGCCGATGTATGGGAGATGAA^ 

1 1 1 1 1 II 1 1 1 1 M I II 1 1 1 II I M 1 1 II II I II 1 1 II 1 1 1 III I II I II I II I II I II 

GTCCATGGGGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACGAGO 277 



TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGCAGGCCGAGA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 M 1 1 1 1 ! II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



120 



GACCCAATAACCAGCAGAATTGTTTCAAAGTTTGCGATTGGCACAAAGAGTTC 180 

1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 

GACCCAATAACCAGCAGAATTGTTTCAAAGTTTGCGATTGGCACAA^ 397 

GGAGACTGGGACCTTGGAATCAGTGTCAGCCCGTGATTTCAAAAAGCCTAG^ 240 

III III 1 1 II I MM II II I III II II lllllllll Mill I Mill lllllll III III 

GGAGACTGGGACCTTGGAATCAGTGTCAGCCCGTGATTTCAAAAAGCCTAG^ 457 

TTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGGGAGATAGCGTGCATCCAGAA^ 300 

Ml II I II III MM II III III MIMMIII III MM I M I M I M I M I M M I E 

TTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGGGAGATAGCGTG<^TCCAGAAAG 517 



M 1 1 1 M 1 1 M 1 1 1 II I ! 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 



AGCAGGCTTGC CTCATTCCTTGC CAGCAAGATTGCATCGTGTCTGAATTTTCTGCCTGGT 420 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

AGCAGGCTTGC CTCATTCCTTGCCAGCAAGATTGCATCGTGTCTGA^ 637 

CCGAATGCTCCAAGACCTGCGGCAGCGGGCTCCAGCACCGGACGCGTCATC 480 

IIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCGAATGCTCCAAGACCTGCGGCAGCGOGCTCCAGCACCGGACGCGTC^ 697 

CCCCXX1AGTTCGGAGGCTCTGGCTGTCCAAAC(^TGA 540 

iiiiMiiiiiiiMiiimmiiiiiiiMiiiiiiiiiiiiiiiiiiiiiimii 

CCCCGCAGTTCGGAGGCTCITGGCTGTCCAAACCTGACGGAGTTCC^ 757 



IIIIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIl! 
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CAATGCCCCACrCCCGACAAGTAAGACAAGCAAGGAGACGCGGGAAGAATAAAGAACGGG 660 

1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M M 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CAATGCCCCACTCCCGACAAGTAAGAC^AGCAAGGAGACGCGGGAAGAATAAAGAACGGG 877 
AAAAGGACCGCAGCAAAGGAGTAAAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGA 720 

IIMIIIIIIIIM MIMIIMIMM MM II I. 1 1 II II 1 1 M 1 1 1 1 M II 

20 





Sbjct: 


878 




Query : 


721 


5 


Sbjct: 


938 




Query: 


781 


10 


Sbjct: 


998 




Query: 


841 




Sbjct : 


1058 


ID 


Query: 


901 




Sbjct: 


1118 


20 


Query: 


961 




Sbjct: 


1178 




Query: 


1021 


25 


Sbjct: 


1238 




Query; 


1081 


30 


Sbjct: 


1298 




Query: 


1141 




Sbjct : 


1358 


35 


Query: 


1201 




Sbjct: 


1418 


40 


Query: 


1261 




Sbjct: 


1478 






1321 


45 


Sbjct: 


1538 




Query: 


1381 


50 


Sbjct: 


1598 




Query: 


1441 




Sb j c t : 


1658 


55 


Query: 


1501 




Sbjct: 


1718 


60 


Query: 


1561 




Sbjct: 


1778 




Query: 


1621 




Sbjct: 


1838 




Query: 


1681 


70 


Sbjct: 


1898 




Query: 


1741 




Sbjct: 


1958 


75 


Query: 


1801 




Sbjct: 


2018 



AAAAGGACCGCAGCAAAGGAGTAAAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGA 937 



MIIIIIIIIM II 1 1 II II III I II I II till I! 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 II llll 



997 



IIIIIIIIIIIIIIIIIIIIIIIMIMIIII 



lllllllllllllllllllllll 



840 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCAT 960 

LI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 1177 
GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTC 1020 

III III 1 1 I llll II II II II Ml III III III I II III III I II II I III I II 1 



CAGAATTTGAAGAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTC 1080 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CAGAATTTGAAGAAAAAGAACCCTGTTTGTCTCAAGGAG 1297 
CCTATGGCTGGAGAACTACAGAGTGGACTGAGTGCC 1140 

I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CGT ATGGCTGGAGAACT ACAGAGTGG ACTGAGTGCCGTGTGG AC CCTTTGC^ 1357 
AGG ACAAG AGGCGCGGC AAC C AG ACGGCC CTCTGTGGAGGGGGC ATCC AG AC CC GAG AGG 1200 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



TGTACTGCGTGCAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACA^ 1260 

IMIII III I II II II II II II II IIMMIIMM III III IMIII 1 1 II 1 1 III II I 



MIIIIIIIIM I II II II II II IMIII llll III llliMIII II I II I M I III 

AAGCCTCAAAGCCAATGGACTTAAAATTATGCACTGGACCTA 1 537 



Ml 1 1 III I Ml Ml I llll 1 1 1 1 III II I llll I II I II I II 1 1 1 1 1 IMIII III 



GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCG 1440 

Mill IIMIIIII MIMMIIMMM IMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTACHTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGC 1657 
GCATTACCAATGAGCCGACTGGAGGCTCTG<5GGTAACCGG 1500 

1 1 II I II I II 1 1 II II III I II Mil III Ml III III III I IMIII II Ml Ml I II I 



AAGCC^TTCCCTGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACT 1560 

IIIIIIIIMMIIIIIIIIIIIIIMIIMMIIIIIIIIIIIIMIIIIMMIIIII 

AAGCCIATTCCCrrcTGAAGAGCCTGCCTGTTAT 1777 
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II llll II Ml I II Ml II II IMIII Mill III III III III III II I III MM II I 

GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCA 1837 
TCAACAGTGATGGAGAAGAAGTTGACAGACAGCTGTGCAGA 1680 

MIMMIMMM II II! 1 1 1 II MM I II I llll II III III III III Mllllll I 



CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCAC^ 1740 

1 1 1 M M I II II M 1 1 1 1 M II II I II I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 

CTGTGGCCnraKSATGCCCCATGCCCGA 1957 

CCTCCTGCTCACACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGAT 1800 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
CCTCCTGCTCACACACCTGCTCAGGGAAAACGACAGAAGGGAAAC 2017 

CCATTCTGGCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTG 1860 
I I I I I I | I I I I I II I I I I I II I II I II II I I I I II I I I I I I I I I I II I I I I II I I I I I I I 
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1921 




Sbjct: 


2138 


10 
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Sbjct: 


2198 
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2041 
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2258 
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2101 
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2318 
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2378 
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2221 
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2438 
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2281 
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2498 
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2341 
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2558 




Query: 


2401 
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2618 




Query: 


2461 




Sbjct : 


2678 
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2521 




Sbjct: 


2738 


50 


Query: 


2581 




Sbjct: 


2798 




Query: 
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Sbjct: 


2858 




Query: 


2701 


60 


Sbjct: 


2918 




Query: 


2761 




bDDCu : 


O OTft 
4. J / O 


65 
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2821 




Sbjct: 


3038 


70 
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2881 




Sbjct: 


3098 




Query: 


2941 


75 


Sbjct: 


3158 




Query: 


3001 



AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCT 1920 

IIIIIIMIIIIIIIIIIIIMIIIMIIIIIIIIIIIMIIIIIIIIMIIIMIllll 

AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAG1X3TACCACTGG 2137 
GGGGCCAGTGCATTGAGGACACCTC&GTATCGTCCTTCAA 1980 

IIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

GGGGCCAGTGCATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATG 2197 
GGGAGGCCTCCTGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTC 2040 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIMIIIIIIIIIIIIIII 

GGGAGGCCTCCTGCTCTGTCGGCATGCAGACAAGA 2257 
TGGGCCAAGTGGGACCCAAAAAATGTC CTG AAAGC CTTCGAC CTGAAACTGT AAGGCCTT 2100 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMI 

TGGGC CAAGTGGGAC C CAAAAAATGTCCTG AAAGC CTTCGAC CTG AAACTGTAAGGC CTT 2317 



IMIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIMIMIMIIIIIIlllllll 



'iMIIIIIIIIIIIIMIIIIlllllllllllllMIIIIIIIIIIMIIIMIMM 



2377 



MIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIMIIIMIIIIIIIIIIIIIIIIIII 



2280 



I M I IIIIIMIMI II II III III II Mlllllllllll II Nl lilllll II I II 



2340 



2557 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIMIMII 

AATT AGTCC CTTGGAGCGTGC AACAAG AC AGC C CTG^IAGCACAGG AAGGCrGTGGGCCTG 2617 

GGCC^CAC^CAAGAGCCATTACTTCTCC^AAGCAAGATGGA 2460 
II III I II I II I I II I I II I I M I II I I II III III I III I II I II II Ml I II I _ 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 



IMIIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AGG ATGACTGTCAATTGACCAGCTGGTC CAAGTTTTCTTCATGCAATGGAGACTGTGGTG 
CAGTTAGGACCAGAAAGCGCACTCTTGl^GGAAAAAGTAAAAAGAAGGAAAAATCT 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIIIIIIIII 



2520 



ATTCCCATTTGTATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCAC 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIII 



2640 



2700 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIM 



IIIIIIIIIIIIIIIIIIIIM! Illlllllllllll IIIIIIMIIIIIMi HIM 



MINI llillll IMIIIII1 llllll MINIM I HIIIIMIMIIIMIIII 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
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3518 
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Sbjct: 


3578 
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3421 


Sbjct: 


3638 
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3481 


Sbjct: 


3698 
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3541 


Sbjct: 


3758 


Query: 


3601 


Sbjct: 


3818 


Query: 


3661 


Sbjct: 


3878 


Query: 


3721 


Sbj Ct : 


3938 


Query: 


3781 


Sbj Ct : 


3998 


Query: 


3841 


Sbjct: 


4058 


Query: 


3901 


Sbjct: 


4118 


Query: 


3961 


Sbjct: 


4178 


Query: 


4021 


Sbjct: 


4238 


Query: 


4081 


Sbjct: 


4298 


Query: 


4141 


Sbjct: 


4358 



3218 AAAAACCATATAATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGG 3277 



IIIIMIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 



3120 



3337 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII 



CCCGAAAAGTGAGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATT 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIII 

CCCGAAAAGTGAGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATT 
ACCTCTGTGACCCAGAAGAGATGCCCCTGGGCTCTAGAGTC 

III I II 1 1 1 ! 1 1 il II 1 1 II I III II 1 1 MM IN III Ml I III Ml I II 1 1 II 1 1 II 1 



3180 



3397 



3240 



3300 



AGGACTGTGTGATATCTGAATGGGGTCCATGGACCCAAT 
I I I II I I I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I M I I I I I I I I I I 



ATCAAA 3360 

IMIIIIIIi 



MM I II M 1 1 1 II 1 1 M III I II I III II 1 1 Ml Ml 1 1 II I II I i II 1 1 II 



GCCCTAATGCTGTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATT 3480 

IMIIIIIII II II I III II IMIIIIIII MUM II III I HUM I II I INI III I 

GCCCTAATGCTGTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATT 3697 



MIMI 



Ml IMIIIIIII MMMMMMMIMI III MIMIMMIMM 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 



II I III 1 1 1 II II 1 1 II III I M III III MMIM MM II II 1 1 Mill M MM 



3660 



CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGT 

ii i inn 1 1 ii i ii ii mi i ii iii iii Mini nun i in in inn mi 



3877 



3720 



1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 1 » 1 1 1 1 1 1 1 1 1 i 1 1 J 1 1 1 1 1 1 1 i i 



3840 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i E L 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 F 1 3 1 1 1 1 11 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 



iiiiiiiiiiiiiiiiimiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiimiii 



CAAGGAACATTTCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTTC 3960 

mum iii i m ii ii ii nun i iiiiin nun i ii i iii i ii ii in mi 

CAAGGAACATTTCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGG 4177 
ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGG 4020 

inn m i ii ii 1 1 hum mini mini mini in in i mi ill mi 



AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 4080 

mmiiininiimniiiiiiiiimiiiiimnimimiiiiiim 

AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 4297 
GCCTGTGTCAGCTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCA 4140 

inni Minn mini iiiiMMiiiiiiiiiiii i nun muni mi 



II UN lili;: Mil IHI Mill MM MM HIM 
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Sbjct: 


4478 
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4321 




Sbjct: 


4538 
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4381 
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Sbjct: 


4598 
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4441 


20 


Sbjct: 


4658 




Query: 


4501 






4718 


25 
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4561 




Sbjct: 


4778 


30 


Query: 


4621 




Sbjct: 


4838 




Query : 




35 


Sbjct: 


4898 




Query: 


4741 


40 


Sbjct: 


4958 




Query: 


4801 




Sbjct: 


5018 



45 



50 



III Mill I MM MUM lllll MUM MIMM Mill MIMM 



II IMIMI MIMIMIIIIIIIIMIIMIIIIIMIIIMIIIIIIIIIIIMI 



4537 



minimi 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 



1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 



lllllllllllllllllllllll 



IIIIMIMIII :MIII IIIM MIMMIMMM MIIMMIII 



4560 



1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ( 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 J 1 1 



4620 



i I I I I I I I I I I I 1 I I I I I I I f I I I I I I I I 1 I I I I I I I I > I I 1 I I 1 I I 1 1 I I 1 I 1 I I I I 
GTAACCCAGCAGGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGAC 



4837 



TAAAGACCTGGGTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCT 4680 

I ] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 i 1 1 1 ! 1 1 1 1 1 1 i 1 1 f 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 i II 1 1 1 1 

TAAAGACCTGGGTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTC 4897 

CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 4740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I i I I I I I 
CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 4957 



TGAAACCTTTAACCTTAGCCTATGATGGAGATGCCGACATGTAACATATA^ 

IMIIMI Ml Mill II I II 1 1 II Ml II IMMIMMMMII 1 1 MM 



4800 



I i 1 I I I I i I 1 



5027 



Table 3. BLASTN VERSUS GENBANK COMPOSITE 

>Qb:GENBANK-ID: AB02317T | acc : AB023 177 . 1 Homo sapiens mRNA for KIAA0960 protein, 
partial cds - Homo sapiens, 5032 bp. (SEQ id NO: 59) 
Length = 5032 

Plus Strand HSPs: 



55 



60 



65 



70 



75 



Score = 19495 (2925.0 bits), Expect = 0.0, P = 0.0 

Identities = 3899/3899 (100%), Positives = 3899/3899 (100%), Strand = Plus / Plus 
GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 971 

II III Mill IN I II I MM 1 1 IMIMI I MM NIMH IN lllll 1 1 M I III II 

GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 6 0 
AGGACACGAACCATC AGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 1031 

1 1 1 1 1 1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M I II 1 1 II I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 120 
GAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 1091 

1 1 1 1 1 1 1 1 1 ] 1 1 M 1 1 1 1 r M 1 1 1 M i 1 1 [ 1 1 II II t f 1 1 1 1 1 1 1 M M II M M 1 1 1 1 1 

GAAAAAGAACCCTGTTTGTCTCAAGG AGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 180 
AGAACTAC AGAGTGGACTG AGTGCCGTGTGGACCCTTTGCTC AGTCAGCAGGACAAGAGG 1151 

M MMM II MM II II II M Mill II MM MM II II I Mill MM II II III II 

AGAACT ACAG AGTGG ACTGAGTGCCGTGTGG ACCCTTTGCTC AGTCAGCAGGACAAGAGG 240 
CGCGGCAACCAGACGGCCCTCTGTGG AGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 1211 

Mill II II 1 1 II I II 1 1 MM I II 1 1 II I II I MM 1 1 IIMIMI I M Mill lllll 

CGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 300 

24 
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972 


Sbjct: 


61 
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1032 


Sbjct: 


121 
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1092 


Sbjct: 


181 


Query: 


1152 


Sbjct: 


241 
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1212 


Sbjct : 


301 


Query: 


1272 


Sbjct: 


361 


Query: 


1332 


Sbjct: 


421 


Query: 


1392 


Sbjct: 


481 


Query: 


1452 


Sbjct: 


541 


Query: 


1512 


Sbjct: 


601 


Query: 


1572 


Sbjct: 


661 


Query: 


1632 


Sbjct: 


721 


Query: 


1692 


Sbjct: 


781 


Query: 


1752 


Sbjct: 


841 


Query: 


1812 


Sbjct: 


901 


Query: 


1872 


Sbjct: 


961 


Query: 


1932 


Sbjct: 


1021 


Query: 


1992 


Sbjct: 


1081 


Query: 


2052 


Sbjct: 


1141 


Query: 


2112 


Sbjct: 


1201 


Query: 


2172 


Sbjct: 


1261 


Query: 


2232 



CAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAAC AAAGAAGCCTCAAAG 1271 

III II Mi 1 1 II 1 1 1 1 1 1 1 II Mil I II 1 1! I II 1 1 1 II IN II 1 1 1 I M 1 1 Ml I II 1 1 

CAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAGAAGCCTCAAAG 360 
CCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCC ACATT 1331 

MMI 1 1 1 M II 1 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 M 1 1 1 II I II I MIMI M 1 1 1 1 II M I M 

CCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCC ACATT 420 
CCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 1391 

MM II I II III II Ml M III II III I M Ml Ml 1 1 1 1 1 I II I II II I II M III II 

CCTTGTCCAACTGAATGTGAAGTTTC ACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 480 
AACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 1451 

Ml 1 1 Ml 1 1 I II I M I II II M M I II I II III M I M MMI I II 1 1 M 1 1 1 1 1 1 1 1 1 

AACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 540 
GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 1511 

II I M 1 1 Ml Ml I II I M II M I II I II I M 1 1 III 1 1 1 1 1 M MM 1 1 III II MM I 

GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 600 
TGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 1571 

II M II Ml M 1 1 1 M II 1 1 II li II II I MM MM I II 1 1 II 1 1 1 II 1 1 1 II I II I 

TGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 



AACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCATCAACAGTGAT 

MMIIIMI MM MMMMI IMMIMIII IIIIMM MMMMMMIMM 

AACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCATCAACAGTGAT 



660 



1631 



720 



GGAGAAGAAGTTGAC AGACAGCTGTGCAGAGATGCCATCTTCCCCATCCCTGTGGCCTGT 1691 

MM M M I M I M 1 1 M II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 Mill 

GG AGAAG AAGTTGAC AGAC AGCTGTGCAG AG ATGCCATC TTC CCC ATCCCTGTGGCCTGT 780 
GATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGTCCTCCTGCTCA 1751 

II M II I II I M I M M M I I! M M M I MM MM II M II II M II II I M MM 

GATGCCCCATGCCCG AAAGACTGTGTGCTCAGCACATGGTCT ACGTGGTCCTCCTGCTCA 840 
CACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGATCCATTCTGGCC 1811 

I M I II 1 1 1 1 1 1 M M I MMI MIMI II llllllll II II M MM M II M M M M 

CACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGATCCATTCTGGCC 900 
TATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGCAAGAAGTACGA 1871 

III IMMM IM III II II III MMM MMMMIMM MM MM MM II MM I 

TATGCGGGTG AAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGCAAGAAGTACGA 960 
AGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCTGGGGCCAGTGC 1931 

MMIM MMIMM IMIIIMM III II II II MIMM III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCTGGGGCC AGTGC 1020 
ATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCCTCC 1991 

I MIMI II MM II M III II II 1 1 M II II I II I II I II M M I II II II M 1 1 MM 

1021 ATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCCTCC 1080 
TGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATGTGGGCCAAGTG 2051 

III IMMIMM I Ml Ml MMI Mil MM 1 1 1 Ml I II MM 1 1 1 MM Ml II II 

TGCTCTGTCGGCATGCAGACAAGAAAAGTC ATCTGTGTGCGAGTC AATGTGGGCC AAGTG 1140 
GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTTGTCTGCTTCCT 2111 

I M I II I III I MMI MMI II II II II II IM MM M M M | M ;! 1 1 MMM! I 

GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTTGTCTGCTTCCT 1200 
TGTAAGAAGGACTGTATTGTGACCCCATATAGTG ACTGGACATCATGCCCCTCTTCGTGT 2171 

MMIM II I MM II MMIM III I II 1 1 1 1 1 1! I II Ml I II MM IMMMMII 

TGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCCCCTCTTCGTGT 1260 

AAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGC ATCGGGTC ATCATTCAGCTGCCA 2231 

IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIII 

AAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCATTCAGCTGCCA 1320 

GCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 2291 

MMIM MMMMMMIMM MMM 1 1 1 i 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 
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1321 
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2292 


Sbjct: 


1381 
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2352 


Sbjct: 


1441 


Query: 


2412 


Sbjct: 


1501 


Query: 


2472 


Sbjct: 


1561 


Query: 


2532 


Sbjct: 


1621 


Query: 


2592 


Sbjct: 


1681 


Query: 


2652 


Sbjct: 


1741 


Query: 


2712 


Sbjct: 


1801 


Query: 


2772 


Sbjct: 


1861 


Query: 


2832 


Sbjct: 


1921 


Query: 


2892 


Sbjct: 


1981 


Query: 


2952 


Sbjct: 


2041 


Query: 


3012 


Sbjct: 


2101 


Query: 


3072 


Sbjct: 


2161 


Query: 


3132 


Sbjct: 


2221 


Query: 


3192 


Sbjct: 


2281 


Query: 


3252 


Sbjct: 


2341 



GCCAACGGGGGCCGAGACTGC ACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 1380 
CAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCCAATTAGTCCCT 2351 

I II II 1 1 !! II! 1 1! II ! I II II Ml I ! I M 1 1 I I MM M Ml 1 1 i! 1 1 1 1 1 1 Mi I 

CAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCCAATTAGTCCCT 1440 
TGGAGCGTGC AAC AAGAC AGCCCTGGAGC ACAGGAAGGCTGTGGGCCTGGGCG ACAGGC A 2411 

IM 1 1 1 1 1 II M I Ml II I II II I II I Ml I M I II 1 1 1 1 II I M II I M M I Ml II I 

TGGAGCGTGC AACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTGGGCGACAGGCA 1500 
AGAGCCATTACTTGTCGC AAGCAAGATGG AGGAC AGGCTGGAATCCATGAGTGCCTACAG 2471 

1 1 1 1 ! 1 1 1 1 II 1 1 1 i 1 1 1 1 M I M 1 1 1 ! 1 1 1 II 1 1 M 1 1 M 1 1 1 1 M 1 1 1 M M 

AGAGCCATTACTTGTCGC AAGCAAGATGGAGGACAGGCTGGAATCCATGAGTGCCTACAG 1560 
TATGC AGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 2531 

IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIillllllllMIIIMI 

TATGCAGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 1620 
CAATTGACGAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTGCAGTTAGGACC 2591 

M II II 1 1 1 1 1 1 1 II Ml II 1 1 1 1 1 1 1 1 II II M I III 1 1 1 1 II 1 1 1 MM I M II I M I 

CAATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTGCAGTTAGGACC 1680 
AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 2651 

Ml 1 1 1 1 III III II MM II I II II II M II 1 1 1 II II II M 1 1 II I Mill I MM II 

AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 1740 
TATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTGGGG 2711 

II! i IIIIIIIIIIIIIIIIIMIIIIIMIMIIIIIIIIIIIIIIIM 

TATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTGGGG 1800 
AACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGGGAATGAAAGTA 2771 

llllllll IIIIMIIIIMIIII MMlMIIII llllllllll MIIIMIIIIIIII 

AACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGG AAGTGTTGCTGGGAATGAAAGTA 1860 



C AAGG AG AC ATCAAGG AATG CGG AC AAGG AT AT CGTT AC C AAGC AAT GG C ATG CT ACG AT 

II I MM I II IM IMIM I II II Mill III I M I III MM I M II M III 1 1 II 1 1 1 

CAAGGAGACATCAAGGAATGCGGACAAGGATATCGTTACCAAGCAATGGCATGCTACGAT 



2831 



1920 



CAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAGGCC 2891 

1 1 MM I II 1 1 II IMMM II II II MMMM I M II III II IM I M II i M 1 1 1 1 1 

CAAAATGGC AGGC TTGTGGAAACATC TAG ATGT AACAGCC ATGGTTAC ATTGAGGAGGCC 1980 
TGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2951 

Mill Ml MMI II IMMM II II MM I IM MMM II MM MM I IMMM M 

TGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2040 
AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTGAAAAACCATAT 3011 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTGAAAAACCATAT 2100 



mi 



1 1 1 1 1 



inn 



I IMIIMII II llllllllll! Mill MMM INI llllllll II 



nun ii ii in inn mini hum mi iiiiiiii 



III 



MM IMIII IIIIIIIIMIIIIIIIIII I 



3251 



AGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTGAC 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTG AC 2340 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG 3311 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG 2400 

26 
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ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 3371 

lllllilMIMIIIIMMIIIIIIIIIMIMMIlllll HIM IIMIMIIIIM 

ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 2460 
CAAAGGTCAGCTGATCCCATC AGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 3431 

1 1 1 1 1 1 1 1 M II II 1 1 1 II II II II MM II III 1 1 1 II II! Ill II III III II III I 

CAAAGGTCAGCTG ATCCCATCAGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 2520 
GTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 3491 

II III II I II II II III 1 1 II II II INI II III I II II III III II III Ml III III I 

GTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 2580 
GACTGGAGTAC ATGTCAGCTGAGTGAGAAGGCAGTTTGTGGT^AATGGAATAAAAACAAGG 3551 

1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 

GACTGGAGTACATGTCAGCTGAGTGAGAAGGGAGTTTGTGGAAATGGAATAAAAACAAGG 2640 
ATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCGCTT 3611 

II I IMM !M 1 1 1 1 IM I M II III 1 1 1 1 1 1 1 1 1 1 MM1 M II II 1 1 1 1 1 1 1 1 Ml I 

ATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCGCTT 2700 
GGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCCCTGTGAACTGT 3671 

Ml !l IM III 1 1 1 1 MM II IMM 1 1 1 1 1 1 1 1 1 MM II 1 1 1 1 II II 1 1 1 1 II I II I 

GGCTTGGAGAAGAACTGGC AGATGAACACGTCCTGCATGGTGGAATGCCCTGTGAACTGT 2760 



I II I M 1 1 I II M II II II II M II 1 1 M II I 1 1 M I III I II II M I III I II 1 1 1! 



ATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGACCATGCCCTTCC 3791 

1 1 1 II I Ml 1 1 MMI I Ml II 1 1 IM 1 1 1 1 M II I 1 1 1 1 M M 1 1 1 M I II II MM 

ATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGACCATGCCCTTCC 2880 
CTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG 3851 

II I II II I II MMIM II II II II I MM I II I II MM II II MM I II II III M 1 1 

CTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG 2940 
TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACC AGAACAAGGAACATT 3911 

II I II II I II II I II II I III II II I MM I II 1 1 M II II M I MMM MMM II M 

TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACC AGAACAAGGAACATT 3000 
TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3971 

1 1 1 III MM 1 1 1 1 II 1 1 II M II III I M 1 1 1 1 II II 1 1 II M I M II 1 1 1 II I M I 

TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3060 
TG TGC TG ACATTG AACT C ATT AT AG ATGGT AAT AAAAAT ATGG TT CTGG AGG AATCC TGC 4031 

M M 1 1 II ill M II II I MM I II I IM I II 1 1 MMI II I M I M II I Ml M I MM 

TG TGC TG ACATTG AAC TC ATT AT AG ATGG T AAT AAAAAT ATGG TTCTGG AGG AATCCTGC 3120 
AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 4091 

MINIUM! IMIIIIIIimilllllllllMIIIIIMIIII IIIIIIIIIMII 

AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 3180 
CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 4151 

II II 1 1 1 MM M MM II II II MM MM M 1 1 II 1 1 II 1 1 II 1 1 1 1 1 1 1 1 II I II II 

CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 3240 
GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 4211 

IMM I II 1 1 III II II 1 1 II MM MM IM 1 1 MMIM II II II III MMM II II 

GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 3300 
TC ATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTTGGAAGGGCTCT 4271 

1 1 1 1 1 I MM I Ml II 1 1 IMMIM 1 1 M I M II I IM I M II 1 1 I II 1 1 1 M Mill 

TCATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTTGGAAGGGCTCT 3360 
TCCCGAACAGTGTGGTGTCAAAGGTC AGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 4331 

llllllllll IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII M MM III I II 

TCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 3420 
ATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 4391 

1 1 1 1 M I i I M 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 II I II M I! M 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 II 1 1 1 1 
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3421 ATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 3480 
rGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCATGTCTTCTAAC 4451 

f 1 1 1 1 1 M 1 1 1 ! 1 1 M 1 1 1 1 1 M I M 1 1 1 1 M 1 1 1 ! ! 1 1 1 1 1 M 1 1 1 M 1 1 1 ( M 1 1 1 ! [ 

rGTAGCGAGACAAAAAC ATGCCATTGTGAAGAAGGGTACACTGAAGTCATGTCTTCTAAC 3540 
^GCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 4511 

1 1 MMIII I III I II I II III 1 1 1 Mill I III Mil III I II III III II 1 1 1 II I II 

AGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 3600 
AGAGGAGATGTGAAAACCAGTCGGGCTGTAC ATCCAACCCAACCCTCCAGTAACCCAGCA 4571 

II 1 1 1 MM I II 1 1 1 1 II I II 1 1 1 1 1 1 1 1 II III II III III III II I II II 1 1 1 III II 

AGAGGAGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCAGTAACCCAGCA 3660 
3GACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGACTAAAGACCTGG 4631 

1 1 III II II I II IMMI Ml II III III I II Mill I MM II MMIII I III II I II 

3GACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGACTAAAGACCTGG 3720 
3TTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCTCCATGATTTAT 4691 

20 | M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 
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1 1 1 1 1 II II 1 1 1 II 1 1 II 1 1 1 1 1 M 1 1 II II 1 1 1 1 1 II II 1 1 1 1 II Ml 1 1 1 1 1 1 II 1 1 

CTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGACTGAAACCTTTi 
ACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTGGCAACAACCA 

I IIM1 1 1 1 1 1 M M 1 1 1 1 1 1 M 1 1 1 II I MIMI 1 1 1 IM III Ml I II M 1 1 1 1 1 I 



SECP11 

A SECP1 1 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:42 and encoded polypeptide sequence (SEQ ID NO:43) of clone 
CG508 17-04 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 

35 them. FIG. 16 illustrates the nucleic acid sequence and amino acid sequences. This clone 

includes a nucleotide sequence (SEQ ID NO:42) of 1447 bp. The nucleotide sequence includes 
an open reading frame (ORF) beginning with an ATG initiation codon encoding a polypeptide of 
224 amino acid residues (SEQ ID NO:43). The start codon is located at nucleotides 520-522 and 
the stop codon is located at nucleotides 1 192-1 194. Putative untranslated regions, if any, are 

40 found upstream from the initiation codon and downstream from the termination codon. The 
protein encoded by clone CG508 17-04 is predicted by the PSORT program to localize in the 
cytoplasm with a certainty of 0.4500. The program PSORT and program SignalP predict that the 
protein appears to have no amino-terminal signal sequence. 

Novel peptidase (HPEP-8)-like proteins are related to conditions of failure to thrive, 

45 nutritional edema, and hypoproteinemia with normal sweat electrolytes as reported by Townes et 
al (J. Pediat. 71: 220-224, 1967) for 2 affected male infants. This condition could be treated by a 
protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) reported an 
affected female who also had imperforate anus, a result of a defect in the synthesis of the 
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enterokinase which activates proteolytic enzymes produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 
member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 

5 trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 
Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41: 305-310, 1986) isolated cDNA clones for 2 major human 
trypsinogen isozymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 15-amino acid signal 

10 peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. The gene encoding trypsin-1 (TRY1) is also referred to as serine protease-1 
(PRSS1). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 

15 685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of 
pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 

20 T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 

tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 
suggesting shared functional or regulatory constraints, as has been postulated for genes in the 
major histocompatibility complex (such as class I, II, and HI genes) that share similar long-term 

25 organizational relationships. The gene of invention is a novel serine protease containing a trypsin 
domain but localized on chromosome 16. 

The sequence of the invention was derived by laboratory cloning of cDNA fragments 
covering the full length and/or part of the DNA sequence of the invention, and/or by in silico 
prediction of the full length and/or part of the DNA sequence of the invention from public human 

30 sequence databases. 

The laboratory cloning was performed using one or more of the methods summarized as: 
SeqCallingTM Technology, where cDNA was derived from various human samples representing 
multiple tissue types, normal and diseased states, physiological states, and developmental states 
from different donors. Samples were obtained as whole tissue, cell lines, primary cells or tissue 
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cultured primary cells and cell lines. Cells and cell lines may have been treated with biological or 
chemical agents that regulate gene expression for example, growth factors, chemokines, steroids. 
The cDNA thus derived was then sequenced using CuraGen's proprietary SeqCalling technology. 
Sequence traces were evaluated manually and edited for corrections if appropriate. cDNA 

5 sequences from all samples were assembled with themselves and with public ESTs using 
bioinformatics programs to generate CuraGen's human SeqCalling database of SeqCalling 
assemblies. Each assembly contains one or more overlapping cDNA sequences derived from one 
or more human samples. Fragments and ESTs were included as components for an assembly 
when the extent of identity with another component of the assembly was at least 95% over 50 bp. 

10 Each assembly can represent a gene and/or its variants such as splice forms and/or single 
nucleotide polymorphisms (SNPs) and their combinations. 

Exon Linking, where the cDNA coding for the sequence was cloned by polymerase chain 
reaction (PCR) using the following primers: S^TGCTGACCAACACAGCTGCTCACS 1 (SEQ 
ID NO: 113) a nd 5GAC AnnnncAnTAATGCCATTTGCy (SEP ID NO:102) on the 
15 following pools of human cDNAs: Pool 1 - Adrenal gland, bone marrow, brain - amygdala, brain 
- cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 
pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 
spinal cord, spleen, stomach, testis, thyroid, trachea, uterus. 

20 Primers were designed based on in silico predictions for the full length or part (one or 

more exons) of the DNA/protein sequence of the invention or by translated homology of the 
predicted exons to closely related human sequences or to sequences from other species. Usually 
multiple clones were sequenced to derive the sequence which was then assembled similar to the 
SeqCalling process. In addition, sequence traces were evaluated manually and edited for 

25 corrections if appropriate. 

Variant sequences are also included in this application. A variant sequence can include a 
single nucleotide polymorphism (SNP). A SNP can, in some instances, be referred to as a 
"cSNP" to denote that the nucleotide sequence containing the SNP originates as a cDNA. A SNP 
can arise in several ways. For example, a SNP may be due to a substitution of one nucleotide for 
30 another at the polymorphic site. Such a substitution can be either a transition or a transversion. A 
SNP can also arise from a deletion of a nucleotide or an insertion of a nucleotide, relative to a 
reference allele. In this case, the polymorphic site is a site at which one allele bears a gap with 
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respect to a particular nucleotide in another allele. SNPs occurring within genes may result in an 
alteration of the amino acid encoded by the gene at the position of the SNP. Intragenic SNPs may 
also be silent, however, in the case that a codon including a SNP encodes the same amino acid as 
a result of the redundancy of the genetic code. SNPs occurring outside the region of a gene, or in 
5 an intron within a gene, do not result in changes in any amino acid sequence of a protein but may 
result in altered regulation of the expression pattern for example, alteration in temporal 
expression, physiological response regulation, cell type expression regulation, intensity of 
expression, stability of transcribed message. 

The DNA sequence and protein sequence for a novel Peptidase (HPEP-8)-like gene or 
10 one of its splice forms was obtained solely by exon linking and is reported here as CuraGen Acc. 
No. CG50817-04. 

Real-time expression analysis was performed on SECP11 (clone CG50817-04). The 
results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 
types. 

15 Accordingly, SECP1 1 nucleic acids according to the invention can be used to identify 

one or more of these tissue types. The presence of RNA sequences homologous to a SECP1 1 
nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1086 of 1087 bases (99%) identical to a human peptidase, 

20 hpep-8 mRNA (patn: A37664. The full amino acid sequence of the protein of the invention was 
found to have 254 of 255 amino acid residues (99%) identical to, and 254 of 257 amino acid 
residues (99%) similar to, the 571 amino acid residue ptnr: patp:Y4l704 Human PR0351 
protein sequence from Homo sapiens. 

The presence of identifiable domains in the protein disclosed herein was determined by 

25 searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 

determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
by Interpro) at the indicated positions: domain name trypsin at amino acid positions 15 to 179. 
This indicates that the sequence of the invention has properties similar to those of other proteins 

30 known to contain this/these domain(s) and similar to the properties of these domains. 
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Chromosomal inf rmation: 



The Peptidase (HPEP-8) disclosed in this invention maps to chromosome 16. This 
information was assigned using OMIM, the electronic northern bioinformatic tool implemented 
by CuraGen Corporation, public ESTs, public literature references and/or genomic clone 
5 homologies. This was executed to derive the chromosomal mapping of the SeqCalling 

assemblies, Genomic clones, literature references and/or EST sequences that were included in 
the invention. 

Tissue expression 

The Peptidase (HPEP-8) disclosed in this invention is expressed in at least the following 
10 tissues: Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, 
brain - substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal 
lung, heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, 
prostate, salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, 
thyroid, trachea, uterus. This information was derived by determining the tissue sources of the 
15 sequences that were included in the invention including but not limited to SeqCalling sources, 
Public EST sources, and/or RACE sources. 

Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase (HPEP-8)-like protein are 
shown in Table 7. The results predict that this sequence has no signal peptide and is likely to be 
20 localized in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

The proteins of the invention encoded by clone CG50817-04 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone CG508 17-04 protein. 

25 Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase (HPEP-8>like protein 
includes the nucleic acid whose sequence is provided in Figure 16, or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 
from the corresponding base while still encoding a protein that maintains its Peptidase (HPEP- 
30 8)-like activities and physiological functions, or a fragment of such a nucleic acid. The invention 
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further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
5 include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
mutant or variant nucleic acids, and their complements, up to 1% of the residues may be so 
10 changed. 

The novel protein of the invention includes the Peptidase (HPEP-8)-like protein whose 
sequence is provided in Figure 16. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 16 while still 
encoding a protein that maintains its Peptidase (HPEP-8)-like activities and physiological 
15 functions, or a functional fragment thereof. In the mutant or variant protein, up to about 1% of 
the bases may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
20 invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

25 The protein similarity information, expression pattern, and map location for the Peptidase 

(HPEP-8)-like protein and nucleic acid disclosed herein suggest that this Peptidase (HPEP-8) 
may have important structural and/or physiological functions characteristic of the Serine protease 
family. Therefore, the nucleic acids and proteins of the invention are useful in potential 
diagnostic and therapeutic applications and as a research tool. These include serving as a specific 

30 or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed, as well as potential therapeutic 
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applications such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) 
an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid 
useful in gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue 
regeneration in vitro and in vivo (vi) biological defense weapon. 

5 The nucleic acids and proteins of the invention are useful in potential diagnostic and 

therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 

10 anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
15 immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

Table 4. BLASTN identity search for the nucleic acid of the invention versus GenBank. 
>patn:A37664 Human peptidase. HPEP-8 coding sequence - Homo sapiens, 1661 bp. (seq id no: 60) 
20 Length =1661 

Plus Strand HSPs: 

Score = 5426 (814.1 bits), Expect = 5.1e-240, P = 5.1e-240 
25 Identities = 1086/1087 (99%), Positives = 1086/1087 (99%), Strand = Plus / Plus 
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IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMIII 

3gacaccagtgatgctcctgggaccctacgcaatctgcgcctgcgtctcatcagtcgccc 

:acatgtaactgtatctacaaccagctgcaccagcgacacctgtccaacccggcccggcc 

II I M II II II 1 1 1 1 M I II 1 1 M Mill I MMM I M II 1 1 1 1 III 1 1 M II MM 
:acatgtaactgtatctacaaccagctgcaccagcgacacctgtccaacccggcccggcc 

rgggatgctatgtgggggcccccagcctggggtgcagggcccctgtcaggtctgataggg 

ii in ii in mi mi i ii mi 1 1 n in ii 1 1 in i n 1 1 n inn ii ii in n 

K3GGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 
^GAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAGGA 

40 ~ * MIMMIMMIMMMMIMMIIMMIMMMMMMMMIMMMIIM 

AGAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAGGA 
CTGGGGGAAAGAGCTGCAATCAGAGGGTGTCTGCCATAGCTGGGCTCAGGCATCTGTCCT 

I II II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 II 1 1 1 1 1 1 1 II II 1 1 1 1 II 1 1 1 II I 



I II II II I M I II 1 1 1 1 M 1 1 1 II II II I II 1 1 1 II II II M I II 1 1 II I II II II II 1 1 
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Sbjct: 



301 TGGCTTTGTTGCCTGGCTCCAGGGAG ATTCCGGGGGCCCTGTGCTGTGCCTCG AGCCTGA 



360 





Query: 


363 


5 


Sbjct: 


361 




Query: 


423 


10 


Sbjct: 


421 




Query: 


483 




Sbjct: 


481 


15 


Query: 


543 




Sbjct: 


541 


20 


Query: 


603 




Sbjct: 


601 




Query: 


663 


25 


Sbjct: 


661 




Query: 


723 


30 


Sbjct: 


721 




Query: 


783 




Sbjct: 


781 


35 


Query: 


843 




Sbjct: 


841 


40 


Query: 


903 




Sbjct: 


901 




Query: 


963 


45 


Sbjct: 


961 




Query: 


1023 


50 


Sbjct: 


1021 


Query: 


1083 




Sbjct: 


1081 


55 


Score 


= 193! 




Identities : 




Query: 


600 


60 


Sbjct: 


818 




Query: 


657 


65 


Sbjct: 


874 




Query: 


713 




Sbjct: 


934 


70 


Query: 


771 



CGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 422 

II II M I II MIMMI 1 1 1 1 1 1 1 M ! II 1 1 1 1 1 III 1 Ml 1 1 1 1 1 MINIM 

CGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 420 
TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 482 

I II M M I II I M 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 II 11 1 II 1 1 1 1 1 1 1 1 M M I M i I II M I 

TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 480 
GGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 542 

II II II 1 1 1 MM II III II 1 1 ! I II III IMII IIMI I II III 1 1 1 1 1 II 1 1 1 III 1 1 

GGC^GCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 540 
AGCCTGTGGATCCTTGAGGAC AGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTG 602 

M 1 1 1 1 II 1 1 1 1 1 II 1 1 M 1 1 1 M 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M M II M I II II M 

AGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTG 600 
GGAGGCCAGGCTGATGCACCAGGGAC AGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGA 662 

II III MIMMI II III I III I III III I III III I Mill 1 1 II I II I MM II I III 

GGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGA 660 

GGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGG AATGGAGCGT 722 

I I I I I II I I I I I I I II II II I I II I II II II I II I II I I I I II I I I I I I I I II I I I I I I I 

GGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGG AATGGAGCGT 720 

AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 782 

I II II I II I II I I II I M I II II I I I I I I I I II I I I II I I I II I II I II I I I II I I I II I 

AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 780 

CACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACT 842 

IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CACCC ACCCTGAGGGGGGCTACG ACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACT 840 
GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 902 

llll 1 1 1 INI! I Mill lllllli I III III Mill! I IIMIII Mlllll i Mill 

GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 900 

GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 962 

lllllllllllllllllllillllilllllllllllllllllllllllllllllllllll 

GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 960 

GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGA 1022 

I II I I II I I II II I I I I I II I I I I II I I II II I II II M I II I M I II I I II II II I I I I 

GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGA 1020 

TGGCAGCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTG 1082 

II I II I I I I I I I II I I II I I I I I I II I I I I II II I I I I II I II II I I I I II I I II I I II I 

TGGCAGCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTG 1080 



I I II I 



635/848 (74%), Positives 



2-82, P = 3.7e-82 
635/848 (74%), Strand = Plus / 



Plus 



CTGGGAGGCCAGGCTGATGCAC- CAGGGACAGCTGGCCTGTGGCGGAGC - -CCTGGTGTC 

Ml I llll III II II I llll II Mill III I MM I 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA--GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 



656 



873 



VGAGGAGGCGGTGCTAACTGCTGCCCA-C-TG-CTTCATTGGGCGCCAGGCCC-CAGAGG 712 
I || II III II I I II II II I I I Mill I III 



II 



I II 



II I 



III III I II I III 



35 



Sbjct: 


990 


Query: 


831 


Sbjct: 


1045 


Query: 


887 


Sbjct: 


1102 


Query: 


945 


Sbjct: 


1160 


Query: 


1002 


Sbjct: 


1218 


Query: 


1061 


Sbjct: 


1272 


Query: 


1117 


Sbjct: 


1332 


Query: 


1177 


Sbjct: 


1391 


Query: 


1237 


Sbjct: 


1451 


Query: 


1297 


Sbjct: 


1511 


Query: 


1357 


Sbjct: 


1571 


Query: 


1417 


Sbjct: 


1631 



III I I II II II I I Mill I I II Mill i 1 1 1 1 II 



GCCTGTG - ACACTGGGA- GCCAGCCTGCGGCCCCTCTGCCTGC -CCTATGCTGAC - CACC 

I 1 1 1 1 II I I I I I II 1 1 1 1 III I III I Ml MM 



ACC — TGCCTGATGGGG AGCGTGGCTGGGTTCTGGGACGGGCCCGCCC AGG AGCAGGC AT 

II III III II III I I Mill II Mill II III I I II 

ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 



886 



1101 



944 



1159 



-CA-GCTCCCTCCA- GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 1001 

III III III III I I I II I II MM I I II 

3CTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGACT 1217 



I I MM I II II 



I III III III MM III I I I II I 

-G-CAGGTCTACTTC-GCCGAGGAACCAGAGCCCGAG-G 1271 



Ml I II I llllll II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1176 

II 1 1 1 1 1 1 1 1 III II I II I II II I II I II II II II II II II II I II II II II I II I M 

GGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1390 

CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1236 

II II II I II I II I II II I II II I II I II I II II I II II II I I I I II II II II II I II I II 

CCACCCTGTCATGTGTGATTCCAGGCACCAGGGC AGGCCCAGAAGCCCAGCAGCTGTGGG 1450 

AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1296 

II II II II I II II II I I I I I II II I I II II I I II II II II I II II I II II II II I II I II 

AAGGAACCTGCCTGGGGCCACAGGTGCCC ACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 

GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1356 

II II II II II I II I I II I I II I II II I I II II II II II I II II II I II II I I II I II I I I 

GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1416 

1 1 II M 1 1 1 1 II II II 1 1 1 II 1 1 II II II 1 1 1 1 1 1 1 II II 1 1 !| II I II 1 1 1 MM I II 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1630 
CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1447 

1 1 1 1 1 1 1 1 1 1 1 1 1 r i 1 1 1 i i ( 1 1 1 1 1 1 1 1 1 1 

CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 



Table 5. BLASTP identity search for the protein of the invention versus Non- 
Redundant Composite and GenSeq for the Peptidase (HPEP-8)-like protein of the 
invention. 

>patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa. (seq id 

NO:61) 

Length =571 

Plus Strand HSPs: 

Score = 1372 (483.0 bits), Expect = 1.5e-170, Sum P(2) = 1.5e-170 
Identities = 254/255 (99%), Positives = 254/255 (99%), Frame = +1 

Query: 322 QGDSGG PVLCLEPDGHWVQAG 1 1 SFAS SCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 501 

lllll III IMIIIMII IIMIIIIIIIIIIIMIIMIIMIIIIIIMII Mill II 

Sbjct: 239 QGDSGG PVLCLEPDGHWVQAG 1 1 SFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 298 

Query: 502 S PETPEMSDEDSCVACGS LRTAG PQAGAPS PWPWEARLMHQGQLACGGALVS EEAVLTAA 681 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

36 



Sbjct: 



299 S PETPEMSDEDSCVACGSLRTAG PQAGAPS PWPWEARLMHQGQLACGGALVSEEAVLTAA 358 



HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 861 

ill I il Mi I lil II MM illlMI III Ml Mil Ml III I M I III III III III I 

HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 418 



Mill MIMIIIIIIIIII IMIMII MMII I Ml 11 II 1 1 II III II II II II II 

LCLPYPDHHLPIX5ERGWVIX5RARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILP 

GMVCTSAVGELPSCE 1086 
lllllllllllllll 



(110.9 bits), Expect = 1.5e-170, Sum P(2) = 1.5e-170 
= 56/56 (100%), Positives = 56/56 (100%), Frame = +1 

DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 171 

III I III III IMIII III II II MMII Mil MMII III MMMI IMMM 

DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 239 
(79.2 bits), Expect = 8.7e-15, P = 8.7e-15 



PS PWPWEARLMHQGQLACGGALVSEEAVLTAAHCF IGRQAPE - - EWSVGLGT RP 741 

I MM + II I I + llllllll I I III 11+ I 



--EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWV 915 

111+ II II I I l + lll II I I Mill III I 



I + + +1+ + + 1+ 1+ +11 + I lll+l II 

3WDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN--PARPGMLCG GPQPC 

SANQPAADRGPGHSQEQENAGRQMALLPLSS 1176 

II I + I ++ +1 

2GPCQGDSGGPVLCLEPDGHWVQAGIISFAS 265 

J5.9 bits), Expect = 7.2e-32, Sum P(2) = 7.2e-32 
27/84 (32%), Positives = 42/84 (50%), Frame = +1 

/LGFVAWLQGDSGG PVLCLEPDGHWVQAGI I SFAS SCAQEDAPVLLTNTAAH SSWLQAI 
+1 + +1 II I I I I I 11+ II +1 I + I 1+ 1+ + 



+ 1+ II II ++ ll+l 

tfQVYFAEEPE- PE- AEPGSCLA 563 

Table 6. BLASTN identity search (versus the human SeqCailing database for the Peptidase (HPEP-8)-like protein of the 
invention. 

>s3aq: 132854740 Category D: 12 frag (12 non-5 ■ sig-CG) , 636 bp. (SEQ ID NO:62) 
55 Length =636 

Minus Strand HSPs: 

Score = 1423 (213.5 bits), Expect = 7.0e-59, P = 7.0e-59 
60 Identities = 313/343 (91%), Positives = 313/343 (91%), Strand = Minus / Plus 
Query: 1001 AGCCGGCTGCAG-GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 943 

Ml MMII I Ml I III II II II III I I I I I 

Sbjct: 295 AGCTGGCTGCCCCGGCCT-GCAGGTTGGATGGACAGCAGCCCTGGCCCT-GTGCCCACCT 352 
65 Query: 942 GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 883 

I III Illlllill II MMII Mill III II I MMII Mill Mill III MM II II 

Sbjct: 353 ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 412 

Query: 882 GTCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 823 
70 | | M I II I II I I I I II I II I II I I I I I I II II I I I I I II I II II II I I II II II I I I II 
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Query: 


682 


5 


Sbjct : 


359 




Query: 


862 


10 


Sbjct: 


419 




Query: 


1042 




Sbjct: 


479 


15 


Score 


= 315 




Identities = 




Query: 


4 


20 


Sbjct: 


184 




Score 


= 225 




Identities : 


25 


Query: 


586 




Sbjct: 


63 


30 


Query: 


742 




Sbjct: 


123 




Query: 


916 


35 


Sbjct: 


179 




Query: 


1081 


40 


Sbjct: 


234 




Score 


= 102 




Identities = 


45 


Query: 


295 




Sbjct: 


484 




Query: 


475 


50 


Sbjct: 


542 



30 



35 



413 GTCAGGATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 472 
CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 763 

Mill 1 1 III II I II I II 1 1 1 1 II 1 1 111 II I M I MM I II Ml 1 1 II MM II III M 

CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 532 
CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 703 

IMMIIMI Ml Mill I II MMMM MINI II MMI I II I M llllll II II II 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 592 
CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 659 

MMMIIIMMIIIII IIMIIIIIIMIIIIIIMIIIMI 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 636 

(113.6 bits), Expect = 1.7e-28, P = 1.7e-28 (seq id no»103) 
= 165/179 (92%), Positives = 165/179 (92%), Strand = Minus / Plus 

AGGTCCCCTGTCAGCAGCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT 1057 

INI HI I 1 1 1 1 II I MIIMI Ml Mill MUM 1 1 Ml Ml 

AGGTAAGGTGTGGGGGCCTGG — GGCTCACCTCACAGCTGGGCAGCTCACCCACAGCACT 162 

GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 997 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

GGTAC AC ACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 222 

GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 938 

Mill MM I II 1 1 II II I M MIIIIIIIIIIIMI IIIIIIIMI HUM 1 1 1 1 1 



>s3aq: 134913963 Category E: 1 frag (1 non-CG EST), 415 bp. 
Length = 415 (seq id ho»104) 

Plus Strand HSPs: 
Score = 297 (44.6 bits), Expect = l.le-06, P = l.le-06 

Identities = 61/63 (96%), Positives = 61/63 (96%), Strand = Plus / Plus 





Sbjct: 


413 




Query: 


822 


5 


Sbjct: 


473 




Query: 


762 


10 


Sbjct: 
Query: 


533 
702 




Sbjct: 


593 


15 


Score =757 
Identities = 




Query: 


1116 


20 


Sbjct: 


105 




Query: 


1056 


25 


Sbjct: 
Query: 


163 
996 




Sbjct: 


223 



Query- 1385 TTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 1444 

40 ' III | Ml I 1 1 1 1 1 1 1 1 II II I 1 1 1 MM I 1 1 M II M 1 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 II 

Sbjct: 10 TTGGTGTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 69 

Query: 1445 ATT 1447 
III 

45 Sbjct: 70 ATT 72 

Table 7. ClustalW alignment of the protein of the invention with similar peptidase 
(HPEP-8)s. 

ClustalW alignment of the protein of the invention. 

50 

Information for the ClustalW proteins: 

Accno Common Name Length 

CG508 1 7-04 (seo id no:43> novel Peptidase (HPEP-8)-like protein 

Y41704 (seod)no:122) Human PR0351 protein sequence. 571 

38 



Y90291 rsEon)NO:i23> Human peptidase, HPEP-8 protein sequence. 267 

In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
5 non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. Psort, SignalP and hydropathy results for the Peptidase (HPEP-8)- 
like protein of the invention. 

Table 8. Psort, Signal P and Pfam Results for CG50817-04, Peptidase (HPEP-8)-like 
Protein. 

10 

PSORT data: 

cytoplasm — Certainty=0.4500(Affimiatlve) < suco 
microbody (peroxisome) — Certainty=0.3000(Affirmative) < suco 
lysosome (lumen) — Certainty=a2415(Affirmative) < suco 
1 5 mitochondrial matrix space - Certainty=0.1 OOO(Affirmative) < suco 

Signal P data: 

# Measure Position Value Cutoff Conclusion 
maxC 57 0.130 0.37 NO 
20 max Y 55 0.066 0.34 NO 
maxS 32 0.311 0.88 NO 
meanS 1-54 0.142 0.48 NO 

PFAM data: 

25 Scores for sequence family classification (score includes all domains): 

Model Description Score E-value N 

trypsin Trypsin 69.7 2.7e-21 1 

30 

SECP12 

A SECP12 nucleic acid and polypeptide according to the invention includes the 
nucleic acid sequence (SEQ ID NO:44) and encoded polypeptide sequence (SEQ ID NO:45) of 
clone CG508 17-05 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids 
35 encoding them. This is a related variant of SECP1 1, clone CG50817-04. Figure 17 illustrates 
the nucleic acid sequence and amino acid sequences respectively. This clone includes a 
nucleotide sequence (SEQ ID NO:44) of 1592 bp. The nucleotide sequence includes an open 
reading frame (ORF) beginning with an ATG initiation codon at nucleotides 19-21 and ending 
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with a TGA codon at nucleotides 1582-1584. The encoded protein having 521 amino acid 
residues is presented using the one-letter code in Figure 17. 



The protein encoded by clone CG50817-05 is predicted by the PSORT program to 
localize in the plasma membrane with a certainty of 0.6850, and appears to be a signal protein 
5 (see Table 13 below). 

The sequence identified by exon linking was extended in silico using information from at 
least some of the following sources: SeqCalling assemblies 153687026, 152507187, 153485867, 
153485864 and genomic clone gb_AC009088.5 . 

The genomic clone was analyzed by Genscan, Grail and/or other programs to identify 
10 regions that were putative exons, i.e., putantive coding sequences. The clone was also analyzed 
by TBLASTN, TFASTN, TFASTA, BLASTX and/or other programs, i.e., hybrid to identify 
genomic regions translating to proteins with similarity to the original protein or protein family of 
interest. The following genomic sequence was thus included in the invention: gb_AC009088.5 . 

The DNA sequence and protein sequence for a novel Peptidase-like gene or one of its 
15 splice forms thus derived is reported here as the invention CG50817-05. Genomic clones having 
regions with 100% identity to the extended sequence thus obtained were identified by BLASTN 
searches with the extended sequence against human genomic databases. The genomic clone was 
selected for further analysis because this identity indicates that these clones contain the genomic 
locus for these SeqCalling assemblies. 

20 The regions defined by all approaches were then manually integrated and manually 

corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-05 reported here. When 
necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 

25 was reiterated to derive the full length sequence. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 

sequence of this invention has 1 135 of 1 140 bases (99%) identical to a gb:GENBANK-ID: 

Z34002 human PR0351 nucleotide sequence mRN A from Homo (Table 9). The full amino 

30 acid sequence of the protein of the invention was found to have 476 of 493 amino acid residues 

40 



r 



(96%) identical to, and 479 of 493 amino acid residues (97%) similar to, the 571 amino acid 
residue pa tp:Y4 17 04 human PR0351 protein from Homo sapiens (Table 10). 

A multiple sequence alignment is given in Table 12, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
5 related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 
determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
10 by Interpro) at the indicated positions: domain name trypsin at amino acid positions 61 to 279, 
and 312 to 476. This indicates that the sequence of the invention has properties similar to those 
of other proteins known to contain this/these domain(s) and similar to the properties of these 
domains. 

Chromosomal information: 

15 The Peptidase disclosed in this invention maps to chromosome 16. This information was 

assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 
Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
was executed to derive the chromosomal mapping of the SeqCalling assemblies, Genomic 
clones, literature references and/or EST sequences that were included in the invention. 

20 Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 
Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
25 salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 

trachea, uterus. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, and/or RACE sources. 
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Cellular Localization and Sorting 



The SignalP, Psort and/or Hydropathy profile for the Peptidase-like protein are shown in 
Table 13. The results predict that this sequence has a signal peptide with a cleavage site between 
positions 35 and 36 and is likely to be localized at the plasma membrane with a certainty of 
5 0.6850. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
nucleic acid whose sequence is provided in Figure 17, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 

10 corresponding base shown in Figure 17, while still encoding a protein that maintains its 

Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

15 complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 

20 mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
provided in Figure 17. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 17 while still encoding 
25 a protein that maintains its Peptidase-like activities and physiological functions, or a functional 
fragment thereof. In the mutant or variant protein, up to about 4% of the bases may be so 
changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
30 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
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invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

5 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, and map location for the 
Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 
Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 

10 therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 
such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody 
target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 

15 gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

The nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will have efficacy for 

20 treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 
anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 

25 of the like. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 
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Table 9. BLASTN identity search for the nucleic acid of the invention. 



10 



>patn:Z34002 Human PR0351 nucleotide sequence - Homo sapiens, 2365 bp. (seq id 

NO: 63) 

Length =2365 

Plus Strand HSPs: 

Score = 5649 (847.6 bits). Expect = 4.3e-288, Sum P(2) = 4.3e-288 

Identities = 1135/1140 (99%), Positives = 1135/1140 (99%), Strand = Plus / Plus 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



340 TCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 

I I llllllllllllllllll IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

639 TGCAGCGTGAGGGACTCAGCCC - TGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 



399 



697 



400 CCCAGGGCCTATAACCACTACAGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 459 

II III 1 1 MM Mill III II II III 111 I II lllll III III II II III II I II I II II 

698 CCCAGGGCCTATAACCACTACAGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 757 



460 CCCACGACCCACACACCCCTCTGCCTGCCCC AGCCCGCCCATCGCTTCCCCTTTGGAGCC 

I III III II IIMIMI I II Mill III Mill IM III II I II II IIIMMI I II I 

758 CCCACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCC 



520 



818 



TCCTGCTGGGCCACTGGCTGGGATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAAT 

II IIIMMI III HUM lllll Nil! INN MIIMIMII HUM IIUMI II 

TCCTGCTGGGCCACTGGCTGGGATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAAT 



580 CTGCGCCTGCGTCTCATCAGTCGCCCCACATGTAACTGTATCTACAACCAGCTGCACC AG 

II Mllll lllllllllll II llllllll II III IMIIIMIII llllll II III II II 

878 CTGCGCCTGCGTCTC ATC AGTCGCCCCACATGTAACTGTATCTACAACCAGCTGC ACC AG 
640 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 

I III III 111 MIIMMII III II I II IMM III III II III II llllll II I llll 

938 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 

700 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 

II Ml III lllllllllll II lllll I llll III llllllll III Mill III III II II 

998 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 



760 



519 



817 



579 



877 



639 



937 



699 



997 



759 



1057 



819 



TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 

II Ml III lllll Mill I II II I II I lllllllllll Mill III llll III I II II II 

1058 TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 1117 



820 



879 



CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCT 

II II III I III llllll II II III II llllllll lllll llllll II llllll II III II 
1118 CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCT 1177 

880 TTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 939 

I MM III Ml II llll II II lllll II III III lllllllllll lllll lllll III II 

1178 TTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 1237 

940 GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 999 

Mill III llll III lllllllllll llllllll lllllllllll II llllll II III II 
1238 GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 1297 



60 Query: 1000 AGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTG 1059 

Mill MM II II II II II II III II II llllll lllll llllll II llll I III III II 
1298 AGGCTGATGC ACC AGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTC AGAGGAGGCGGTG 1357 



65 



Sbjct: 



Query: 
Sbjct: 



1060 CTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 1119 

II Ml II I i 1 1 1 1 1 j M 1 1 1 1 III II I llll ill llllllll III lllll llllllll II 

1358 CTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 1417 



44 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



Query: 
Sbjct: 
Query: 
Sbjct: 

Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Score = 



120 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1179 

II MM! Nil 1 1 I llllllllll IMIIIM III II III II III MIM Mill III II 

418 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1477 
180 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCC 1239 

II IMMII MM I IMMIII IMIIIMI MM II Ml II III M III III II III I 

478 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCGAGCCTGTGACACTGGGAGCC 1537 



240 AGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1299 

II Ml MM IMIIIMI IMMIMI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

538 AGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1597 
300 TGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1359 

Ml 1 1 1 II M II llllll Mill III Mill Ml II III II Ml II III IMMIII II 

598 TGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1657 
360 ACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGC 1419 

II Mill 1 1 1 1 II III II MM I III Mill MM I II 1 1 II III M II I II 1 1 II I II I 

658 ACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGC 1717 
420 CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCC 1479 

M II 1 1 1 II II Mill II MIIIIMI IMIMIMI III II Mill III Mill II I 

718 CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGC 1777 
948 (142.2 bits), Expect = 3.0e-74, Sum P(2) = 3.0e-74 (sbq id actios) 



Identit: 


Les = 


= 882/1448 (60%), Positives = 882/1448 (60%), Strand = Plus / 


Plus 


Query: 


110 


TCACCACCTATGCTATCAACGTGAGCCTGATGTGGCTCAGTTT-CCGGAAGGTCCAAGAA 

llllll MM 1 llllll llllll 1 II 1 Nil 

TGACCTCATCTGCTTTGCTT-TGGTCTTCAAGCCGCTCAGCGTGCCTGT-GGACAGCGTC 


168 


Sbjct: 


386 


443 


Query: 
Sbjct: 


169 
444 


CCCCAGGGCCAACCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 

Ml II II MIMIIIMIIIIMIIMIIMIIIMMIMIIIMIMMIMI 

GCCCCGGCCCC-CCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 


228 
502 


Query: 


229 


GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 

II Mill II II Mill II MMMIMMMMMM Mill III Mill II III III M 

GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 


288 


Sbjct: 


503 


562 


Query: 


289 


GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTGCGTG 

I 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 II 

GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTG-GTC 


348 


Sbjct: 


563 


621 


Query: 


349 


AGGGACTCAGCCCCTGGGGCCGAAG - AG-GTGGGGGTGGCTGCCCTGCAGTTGCCCAGG - 

II 1 II 1 MM 1 1 II III III 1 llllll 1 1 III 

AGTGG-TC C - TGGGTTCTCTGC AGCGTGAGGG ACTCAGCCCTGGGGCCG AAGAGGT 


405 


Sbjct: 


622 


675 


Query: 


406 


GCCTATAACCACTAC AGCCAGG -GCTCAGA-CCTGGCCCTGCTGCAGCTCGC-C- CACCC 

1 1 1 1 1 II II II III III 1 II MM 1 1 II 1 

GGGGGTGGCTGCC - CTGC - AGTTGCCCAGGGCCTATAACCACTAC AGCCAGGGCTCAGAC 


461 


Sbjct: 


676 


733 


Query: 


462 


CACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGA-GCCT 

1 1 II 1 1 III III Mill 1 MM 1 1 Mill II III 

CTGGCCCTGCTG-CAGCTC-GCCCACCCCA--CGACCCA-CACA-CCCCTCTGCCTGCC- 


520 


Sbjct: 


734 


786 


Query: 


521 


CCTGCTGGGCCACTGGCTGGGATCAGGA- -CACC AG - TGATGCTC CTGGGACCCT-A 

|| II 1 III 1 1 1 III 1 II 1 II II MMM 1 1 

CCAGCCCGCCCATCGCTTCCCCTTTGGAGCCTCCTGCTGGGCCACTGGCTGGGATCAGGA 


573 


Sbjct: 


787 


846 


Query: 


574 


CGCAA - TC - TGCGCCTGCGTCTC ATCAGTCGCCCCACATGTAACTGTATCTACAACCAGC 

MM Ml MM II 1 1 1 III 1 II III III II 1 II 

CACCAGTGATGCTCCTGGGACCC-T-A — CGCAAT-C-TGCGCCTGCGTCT-CATC-AGT 


631 


Sbjct: 


847 


898 


Query: 


632 


TGCACCAGCGACACCTGTC-CAAC- -CCGGCCCGGCCTGGGATGCTATGTGGGGGCC- -C 

II III 1 MM 1 II II II II III 1 Ml II 1 

CGCCCCACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACC-TGTCCAACCCGGC 


686 


Sbjct: 


899 


957 


Query: 


687 


CCAGCCTGGGGTGC-A-G-GGCCCCTGTCAGGGAGAT-TCCGGGGGCCCTGTGCTGTGCC 


742 



45 



Sbjct: 


958 


it i i i i i i i mi i i ii i iii i i i mi t I I i I I I I I 
II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGT-CAGGGA- 


1015 


Query: 


743 


TCGAGCCTGACGGAC ACTGGGTTCAGGCT -G -GC ATCATCAG -CTTTGCAT-CAAGCTGT 

II II ii i ill I I l l 1 1 l 1 1 1 I 1 1 M 1 1 1 1 II 1 
II II II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 

--GATTCCGGGGGCC-CTGTGCTGTGCCTCGAGCCTGA-CGGACACTGGGTTCAGGCTG- 


798 


Sbjct: 


1016 


1070 


Query: 


799 


GCC-CAGGAGGAC -GCTCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTC- -CTGGCT 

II II II II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
II II II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 

GCATCATCAGCTTTGCATCAA-GCTG - TGCCCAGGAGGAC - GCTC -CTGTGCTGCTGACC 


854 


Sbjct: 


1071 


1126 


Query: 


855 


G-CA- -G - -GCTCG - AGTTCAGGGG- GCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGG 
II 1 MM MM! II MM 1 1 1 Mil 1 II MM 


907 


Sbjct: 


1127 


II i Mil 1 1 1 1 1 ll 1 1 1 1 l l 1 1 1 1 I ill 1 1 1 1 

AACACAGCTGCTCACAGTTCCTGGCTGCAGGCT- - CGAGTTCAGGGGGCAGCTTTCCTGG 


1184 


Query: 


908 


AG ATG AGTGATGAGG ACAGCTGTG - T - AGCC - TGTGGATC - C T -TGAGGAC AGCAGGTCC 
III Mill 1 1 1 1 M Mill MM III till 


962 


Sbjct: 


1185 


III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III (111 
CCCAGAGCCCAGAG-ACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCC 


1243 


Query: 
Sbjct: 


963 
1244 


CC-AGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG-CCAGGCTGATGCACCAGGGACAG 

II 1 1 II II 1 I II II 1 M MM M 1 III M M M 
Mil 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 III 1 1 1 1 1 1 

TTGAGG-AC- AGCAGG-TCCCCA-GGCA- — GGAGCACCCTCCCCATGGCCCTGGGAGGC 


1020 
1296 


Query: 
Sbjct: 


1021 
1297 


CTGGCCTGTGGCGG - AGCC -CTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCT 

1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 II 1 II 1 M 1 1 1 1 

1 1 1 1 1 1 II II 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 I I 1 

CAGGC -TGATGCACCAGGGACAGCTGGCCT — GTGGCGGAGCC — CTGGTGTCAGAGGAG 


1078 
1351 


Query: 


1079 


TCATTGGGCGCCAG -GCCC -C AG AGGAATGG AGCGT- AGGGCTG- G -GG ACCAGACCGGA 

1 II 1 1 1 1 1 1 1 t ! 1 t 1 1 Mil 1 II 1 II 1 1 1 

( II 1 1 1 1 ! 1 1 llllil 1 1 1 1 1 1 1 1 Mill 

GCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTA 


1133 


Sbjct: 


1352 


1411 


Query: 
Sbjct: 


1134 
1412 


GG AGTGGGG - CCTG AAGC AGCTC A - T CCTGC ATGGAGCC TAC ACC C ACCC TG - AGGGGGG 

II 1 1 1 1 1 1 1 1 1 II II M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 I I 

II 1 1 1 1 1 1 1 1 1 II II II ; III 1 1 1 1 1 1 1 1 1 M 1 

GGGCTGGGGACCAGAC - CGGAGGAGTGGGGCCTGAAGC - - AG - CTCATCCTGCATGGAGC 


1190 
1467 


Query: 
Sbjct: 


1191 
1468 


CTACGACATGGCCCTCCTGCTG-CTGGCCCA-GCCTGTGACACTGGGAGCC-AGCCTGCG 

1 1 1 1 1 1 1 I 1 I I III II 1 1 1 Ml M 1 M II 1 1 1 
llllil 1 1 1 1 1 III 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 

CTAC-ACCCA-CCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTG 


1247 
1525 


Query: 


1248 


GCCCCTCT-GCCTGCCCTATGCTGACCACCA-CCTGCCTGATGGGGAGCGTGGC-TGGGT 

II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 II II 1 111 1 

II II 1 1 1 1 llllil 1 II 1 1 1 1 M III III 1 

ACACTGGGAGCCAGCC TGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCT 


1304 


Sbjct: 


1526 


1582 


Query: 


1305 


TCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCT 

till III 1 i II i 1 1 1 1 1 1 1 M M II 1 I 1 1 


1364 


Sbjct: 


1583 


1 1 1 1 III II II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

GATGGG - - GAGCGTGGCTGGGTTCTGGGACGGGCCCGC - CC AGG - AGCAGGCATCAGCTC 


1638 


Query: 


1365 


CCTGGGGCCTAGGGCCTGC - AGCCGGCTGCATGC - AGCTCCTGGGGGTGATGGCAGCCCT 
Ml 1 1 II M 1 1 III III III 1 1 M 1 1 Ml 1 


1422 


Sbjct: 


1639 


III 1 1 1 1 1 1 1 1 III III III 1 1 1 1 1 1 III 1 

CCTCCAGAC-AGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAG 


1697 


Query: 


1423 


ATTCTGCCGGGGATGGTGTGTACCAGT-- GCTGTGGGTGAGCTGC-CCAG— CTGTGAGG 
1 1 II II 1 M II 1 III || I it M II MM 1 1 II 1 1 1 


1477 


Sbjct: 


1698 


1 1 1 1 II 1 1 1 II I III 1 1 1 1 1 1 1 II MM M 1 1 1 II 

CTCCTGGGGGTGATGGCA-GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTG-GG 


1755 


Query: 
Sbjct: 


1478 
1756 


CCAACCAACCAGCTGCTGACAGGGGACCTGGC-CATTCTCAGGAACAAGAGAATGCAGGC 
II 1 II II M M 1 1 MM II 1 1 M II 1 1 1 II 1 II 

II M M 1 M 1 M 1 MM M MM II 1 M II II 1 

TG AGCTGCCC AGCTG - TG AGGGCCTGTCTGGGGC AC - C ACTGGTGCATGAGG- TG - AGG - 


1536 
1810 


Query: 


1537 


AGGCAAATGGCATTACTGCCC 1557 

MM MM M III II 

-GGCACATGG - - TTCCTGGCC 1828 




Sbjct: 


1811 




Score = 894 
Identities = 


(134.1 bits), Expect = 4.3e-288, Sum P(2) = 4.3e-288 (Seq id no 1 106) 
= 182/186 (97%), Positives = 182/186 (97%), Strand = Plus / Plus 


Query: 
Sbjct: 


1 

171 


CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 

MIMMMMMMMMMMMMMMMIIIIIMMMIMMIMIMIMM 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 


60 
230 



46 



Query: 


61 


Sbjct: 


231 


Query: 


121 


Sbjct : 


291 


Query: 


181 


Sbjct: 


351 


Score 


= 699 


Identities = 


Query: 




Sbjct: 


1508 


Query: 


1045 


Sbjct: 


1564 


Query: 


1102 


Sbjct: 


1623 


Query: 


1160 


Sbjct: 


1679 


Query: 


1220 


Sbjct: 


1734 


Query: 


1276 


Sbjct: 


1791 


Query: 


1334 


Sbjct: 


1849 


Query: 


1391 


Sbjct: 


1907 


Query: 


1450 


Sbjct: 


1961 


Query: 


1506 


Sbjct: 


2021 


Query: 


1566 


Sbjct: 


2080 



CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGC ATTGTTTGTAT CACCACCTAT 290 
GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 

llllllllll IMIIIIIIIIIIIIMIMMIMIIIIIII IIIIIIIIIIMIII I 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCAAG 350 



II 



: 391/603 (64%), Positives = 391/603 (64%), Strand = Plus / Plus 
CTGGGAGGCCAGGCTGATGCAC - CAGGGACAGCTGGCCTGTGGCGGAGC- -CCTGG — TG 1044 

ill i mi in n n i 1 1 1 1 ii inn in i mi i 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 1563 



I ii I III I I II I I III II I I I 1 1 II I I I 



II I n I I II 1 1 1 1 1 1 I III III I II I II 

GAGCAG-GCATCAG-CTCCCT-CCAGACAGTGCCCX3TGAC-CCTCCTGGGGCCTAGGGCC 1678 
TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 1219 

mi i i ii ii ii i i inn i i n inn i mi n 

TGC A- GCCGGCTGC ATGC AGC - TCCTGGGGGTGATGGC A — GCCCTATT- CTGCCGGGGA 1733 



i mi n i i i i i ii mi in i in i in in 



CACC - -TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCA 

Ml III III II Ml I I HIM M HIM II Ml Ml 



1333 



1848 



T-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 1390 

I I II III III III I I I II I II I I I I I I I 

TGCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGAC 1906 



II I 1 1 II I II II I III III III Mil III M I II 

TGGGT - CAGCAGTTTGGACTG — G- C AGGTCTACTTC - GCCGAGGAACCAGAGCCCGAG- 



1960 



1505 



GCTGTGGGTG-A-GCTGCCCAGCTGTGAG--GCCAACCAACCAGCTGCTGACAGGGGACC 

| | I I I II I llllll II I lllllllllllllllllllllllllllll 

GC TG AGC C TGG AAGC TGCCTGGC C AAC AT AAGC C AAC CAACC AGC TG CTG ACAGGGG ACC 2020 

TGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATT ACTGCCCCTGTCCTC 1565 

Mlllllllllllll llllllllllllllllllllllllllllllllllllllllllll 

TGGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATT ACTGCCCCTGTCCTC 2079 

CCCACCCTGTCATGTGTGATTCCAGGC 1592 

llllllllllll MIIIIIIIIIIIM 

CCCACCCTGTCATGTGTGATTCCAGGC 2106 



>patn:A37664 Human peptidase, HPEP-8 coding sequence 

(SEQ ID NO: 64) 

Length =1661 



Homo sapiens, 1661 bp. 



Plus Strand HSPs: 



Score = 3831 (574.8 bits), Expect = 5.6e-168, P = 5.6e-168 
Identities = 767/768 (99%), Positives = 767/768 (99%), Strand 



Plus / Plus 



Query : 712 CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 771 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

47 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Sbjct : 


320 


Query: 


772 


Sbjct: 


380 


Query: 


832 


Sbjct: 


440 


Query: 


892 


Sbjct: 


500 


Query: 


952 


Sbjct: 


560 


Query: 


1012 


Sbjct: 


620 


Query: 


1072 


Sbjct: 


680 


Query: 


1132 


Sbjct: 


740 


Query: 


1192 


Sbjct: 


800 


Query: 


1252 


Sbjct: 


860 


Query: 


1312 


Sbjct: 


920 


Query: 


1372 


Sbjct: 


980 


Query: 


1432 


Sbjct: 


1040 


Score 


= 974 



CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 379 

GGCATCATCAGCTTTGC ATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 831 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

GGCATCATCAGCTTTGCATCAAGCTGTGCCC AGGAGGACGCTCCTGTGCTGCTGACCAAC 439 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTC AGGGGGCAGCTTTCCTGGCCCAG 891 

I I Ml IN I 1 1 1 1 1 i 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1< 1 1 1 1 1 1 1 1 1 1 1 1 II I! 1 1 1 

ACAGCTGCTC ACAGTTCCTGGCTGCAGGCTCGAGTTC AGGGGGCAGCTTTCCTGGCCCAG 499 
AGCCCAGAGACCCCGGAGATG AGTGATGAGG ACAGCTGTGTAGCCTGTGGATCCTTGAGG 951 

II I INI 1 1 1 1 1 1 1 1 1 1 1 1 II II M MM !l II I II I II 1 1 1 1 1 1 1 !l 1 1 1 MM 1 1 II 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 559 
ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 1011 

Ml II III I II II III M I lllllll I II III M I IIIIMM II MUM II II MM 

ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 619 
CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 1071 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAGGGAC AGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 679 
CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 1131 

1 1 1 1 1 II 1 1 I II 1 1 1 1 II I II 1 1 I M II I II II MM II 1 1 MM I M 1 1 1 1 1 II II 1 1 1 

CACTGCTTCATTGGGCGCCAGGCCCC AGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 739 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGC ATGGAGCCTAC ACCCACCCTGAGGGGGGC 1191 

I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTAC ACCCACCCTGAGGGGGGC 799 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 1251 

1 1 II Ml 1 1 1 1 II I II I MM I II I M MMI MIMIIMM Ml MIIMII IMM 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 859 
CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 1311 

M M Ml Ml II I II M I M I II I M II M 1 1 1 I MM II MUM I MMMI II Ml 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 919 
CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 1371 

II I II 1 1 M 1 1 M I II II I II M III II 1 1 II I II I M! II MIMIMMIMI II III 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 979 
CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1431 

I M M II 1 1 1 1 M I M 1 1 1 III I II I M 1 1 1 1 1 1 1 II Ml 1 1 1 II I II M 1 1 1 II 1 1 M I 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGC AGCCCTATTCTGCCG 1039 
GGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCC 1479 

II Ml III I Ml! IMM HIM Ml Ml MUM Mill Mill I I 

GGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGC 1087 



Identities = 632/998 (63%), Positives = 632/998 (63%), Strand = Plus / Plus 



55 



60 



65 



70 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



546 GGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 605 

Mill IMMI MMIM IIIIIMI Mill III MIMMMMIMM IIIMIMM 

1 GGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 60 
606 CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 665 

II I MM II M II II III Ml II III I II 1 1 1 1 1 1 II MMM 1 1 1 M II M I M Mill 

61 CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 120 
666 TGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGGA-GATTCCG 724 

M I M III I II II I M 1 1 M M II M II M I 1 1 1 I II IM I MM M M 1 1 Ml I 

121 TGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 180 

725 GGG-GCCCTGT-GCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCTGGCA-TCATCA 781 

|| | || I lllllll Mill I II I I I III II 

181 AGAAGAGAAGGAGCAGAAGGG-GAGGG-GCCTAACCCTGGGCTGGGGGTTGGACTCA-CA 237 

782 G — CTTTGCATCA-AGCTGTGCCCAGGAGGACGCTCCTGTGCT-GCTGACCA-ACACAGC 836 

48 



Sbjct: 


238 


GGACTGGGGG AAAGAGCTGCAATCAG- AGGGTG - TC - TGCCATAGCTGGGCTCAGGCATC 


294 


Query: 


837 


TGCTCACAGTTCCTGGCTGCA-GGCTC G - AG - TTCAGGGGGCAGCTTTCCTG -GCCC 

ii ii i i i i iii inn i ii iii linn n i mi in 

TG-TCCTTGG-CTTTGTTGCCTGGCTCCAGGGAGATTCCGGGGGCC-CTGTGCTGTGCCT 


889 


Sbjct: 


295 


351 


Query: 


890 


AGAGCCC- AGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCT - - 

inn i in i ii i i n i ii i ii i i in i ii in 

CG AGCCTGACGG ACACTGG - GTTCAG - GC TG - - G - CATC A- TC - AGCTT - TGC ATC AAGC 


946 


Sbjct: 


352 


403 


Query: 
Sbjct: 


947 
404 


TGAGGACAGCAGGTC - C - CCAG-GCAGGAGC ACCCTCCCCATGGCCCTGGGAGG - CCAGG 

II 1 III III 1 1 II 1 II 1 1 III 1 1 1 1 II II II II 

TGTGCCCAGGAGGACGCTCCTGTGCTGCTG-ACCAACAC-A-GCTGCTCACAGTTCCTGG 


1002 
460 


Query: 


1003 


CTG-ATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCT 


1061 


Sbjct: 


461 


III 1 II 1 II III III 1 1 MM 1 Mill III 1 1 

CTGCAGGCTCGAGTT - CAGGGGGC AGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAG AT 


519 


Query: 


1062 


AACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAG — CGTAGGGCTG 

i ii ii n in ii i i n inn i ii i mi 

GAGTGATGAGGACAGCTGTGTAGCCTGTG-GATCCTTGAGGACAGCAGGTCCCCAGGCAG 


1119 


Sbjct: 


520 


578 


Query: 
Sbjct: 


1120 
579 


GGG- ACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGC - CTACACCC 

I 1 III II II III II III II 1 II III M 1 

GAGCACCCTCCCCATGGCCCTGGGAGGCCAG - -GCTGATGCACCAGGGACAGCTGGCCTG 


1177 
636 


Query: 
Sbjct: 


1178 
637 


ACCCTGAGGGGGGCTA-C-GACATGGCCCTCCTG-CTGCTGGCCCAGCCTGTGACACTGG 

1 III I I 1 II III 1 II MINI MM III II III 

TGGCGGAGCCCTGGTGTC AGAGGAGGCGGTGCTAACTGCTG - CCC A — CTGCTTCATTGG 


1234 
693 


Query: 


1235 


GAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATG - CTGACCACCAC - CTGCCTGA- TGGGG 

1 Mill III II II II 1 III MM 1 1 II Mill 

GCGCCAGGCCCCAGAGGAA-TGGA-GCG - TAGGGCTGGGGACCAGACCGGAGGAGTGGGG 


1291 


Sbjct: 


694 


750 


Query: 


1292 


AGCGTGGCTGGGT-TCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCC-CTCCAGAC 

1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

CCTGAAGCAGCTC ATCCTGCATGGAGCCTAC- ACC — CACCC - TGAGGGGGGCTAC -GAC 


1349 


Sbjct: 


751 


805 


Query: 


1350 


AGTGCCCGTGACCCTCCTGGG GCCTAGGGC- CTGC - AGCCGGC -TGCATGCAGCTCC 

1 MM 1 II MM MM 1 1 III MM II III II III 

ATGGCCC - TCCTGCTGCTGGCCC AGCCTGTG ACACTGGGAGCCAGCCTGCG - GCCCCTC - 


1403 


Sbjct: 


806 


862 


Query: 


1404 


TGGGGGTGATG-GCAG - CC - CTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGT -G 

II 1 1 II 1 II 1 1 Mill 1 II 1 II 1 II 1 MM 1 

TGCCTGCCCTATGCTGACCACCAC-CTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACG 


1459 


Sbjct: 


863 


921 


Query: 


1460 


AGCT-GCCCAGCTGTGAGGCCAACCAACCAGCTGCTGACAGGGGACCTGGCCATTCTCAG 

II MUM 1 MM 1 II 1 II 1 Mill 1 1 II II 1 II 1 

GGCCCGCCCAGGAGC - AGGC - - ATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGG 


1518 


Sbjct: 


922 


978 


Query: 


1519 


GAACAAGAGAATGCAGGCAGGC 1540 

1 1 II 1 Mill 1 III 

GC -CTAGGGCCTGCAGCC-GGC 998 




Sbjct: 


979 




Score =706 
Identities : 


(105.9 bits). Exoect = 1.9e-23, P = 1.9e-23 (sbq id nox!09) 
= 390/603 (64%), Positives = 390/603 (64%), Strand = Plus / Plus 


Query: 


990 


CTGGGAGGCCAGGCTGATGCAC - CAGGGACAGCTGGCCTGTGGCGGAGC - -CCTGGTGTC 

Ml 1 MM III II II 1 MM II Mill III 1 MM 1 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA — GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 


1046 


Sbjct: 


818 


873 


Query: 
Sbjct: 


1047 
874 


AG AGGAGGCGGTGCTAACTGCTGCCCA-C -TG -CTTCATTGGGCGCCAGGCCC - CAGAGG 

1 II II III II 1 1 II II II 1 1 1 Mill 1 III 

TGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGG 


1102 
933 


Query: 


1103 


AATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT - -CCT 

1 1 II 1 1 II llllll 1 III III 1 II 1 III 

AGCAG-GCATCAG-CTCCCT-CCAGACAGTGCCCGTGAC-CCTCCTGGGGCCTAGGGCCT 


1160 


Sbjct: 


934 


989 



49 



Query: 
Sbjct: 


1161 
990 


GCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCA 

III 1 Mill II 1 1 Mill 1 1 II Mill 1 MM II 

GCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA — GCCCTATT-CTGCCGGGGAT 


1220 
1044 


Query: 


1221 


GCCTGTG - ACACTGGGA- GCCAGCCTGCGGCCCCTCTGCCTGC -CCTATGCTGAC - CACC 

1 MM II 1 1 1 1 1 II MM III 1 III 1 III MM 

GG -TGTGTAC - CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT- CTGGGGCACC 


1276 


Sbjct: 


1045 


1101 


Query: 
Sbjct: 


1277 
1102 


ACC — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 

II III III II III 1 1 Mill II Mill II III 1 1 II 

ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 


1334 
1159 


Query: 


1335 


-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 

III II 1 III III Mill III MM 1 1 II 

GCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGACT 


1391 


Sbjct: 


1160 


1217 


Query: 


1392 


GCATGCAGCTCCTGGGGGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGTG 

1 1 MM 1 II II 1 III III III MM III II MM 

GGGT-CAGCAGTTTGGACTG- -G -CAGGTCTACTTC- GCCGAGGAACCAGAGCCCGAG-G 


1450 


Sbjct: 


1218 


1271 


Query: 
Sbjct: 


1451 
1272 


CTGTGGGTG - A-GCTGCCCAGCTGTGAG — GCCAACCAACCAGCTGCTGACAGGGGACCT 

Ml 1 II 1 llllll II 1 Mill II IN II 1 II MINI II II Mill 

CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 


1506 
1331 


Query: 
Sbjct: 


1507 
1332 


GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 

II 1 II 1 II II 1 1 1 1 MMMM M Ml M M M 1 M IM 1 1 MM 1 Ml M 1 M Ml 1 

GGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 


1566 
1390 


Query: 


1567 


CC ACCCTGTC ATGTGTGATTCCAGGC 1592 




Sbjct: 


1391 


II 1 II 1 1 1 1 II 1 1 1 II II II II II II 

CCACCCTGTCATGTGTGATTCCAGGC 1416 




Score = 481 (72.2 bits), ExDect = l.le-12, P = l.le-12 (seq id no s 110) 
Identities = 409/666 (61%), Positives = 409/666 (61%), Strand = Plus / Plus 


Query: 


207 


CCCTGGCGAGTGGCCCTGGCAGGCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGG 

MM 1 1 lllllllll lllllll III 1 II III 1 III MM 

CCCTCCCCA-TGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGG 


266 


Sbjct: 


584 


642 


Query: 


267 


CTCCCTGGTGGCAGACACCTGGGTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAG- 

MMMM MM . Ill II MIMMMIMIMM 1 III III 

AGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC-ATTGGGCGCCAGG 


325 


Sbjct: 


643 


701 


Query: 


326 


CAACAGAACTGAATTCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGG 

1 MM Ml MM MM II 1 III III III Mill 1 

CCCC AGAG - - G AATGG A - GCGT - AGGG - C TGGGG ACC AG AC - CGGAGGAG - TGGGGCC TG 


385 


Sbjct: 


702 


754 


Query: 


386 


CTGCC -CTGCAGT - TGCCCAGGGCCTATAACCACTAC - AGCCAGGGCTCAGACCTGGCCC 

II II II III 1 Mill 1 MM II Mill III llllll 

AAGCAGCT - CATCCTGCATGGAGCCTACACCCACCCTGAGGG -GGGCTACGACATGGCCC 


442 


Sbjct: 


755 


812 


Query: 


443 


TGCTGCAGCTCGCCCACCC CAC — G-ACCCA-CA — CA-CCCCTCTGCCTGCCCC 

i mi iii him ii iii iiiiii i iiiiiiiiiiiiiii 

TCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCT 


490 


Sbjct: 


813 


872 


Query: 


491 


AGCCCGCCCATCGCTTCCCCTTTGGAGCCTCCTG-CTGGGCCACTGGCTGGGATCAGGAC 


549 


Sbjct: 


873 


1 II MM II II MM 1 II Mill MM II II 1 

ATGCTGACCACCACCTGCCTGATGGGGAG-CGTGGCTGGGTT-CTGGGACGGGCCCGCCC 


930 


Query: 


550 


ACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCCCACA 

1 II III II MM III 1 Mill 1 II II II II 1 

AGGAGC - AGGCATCAGCT - CCCT -C - CAGACAGTGCCCGTGACCC -TCC - TGGGGCCT - A 


609 


Sbjct: 


931 


983 


Query: 


610 


TGTAACTGTATCTACAACCA-GCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCCTGG 

1 ill 1 1 II II II II 1 1 II 1 1 III 1 III II 

GGGC- CTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGC -AGCCCTATTCTGCCGGG 


668 


Sbjct: 


984 


1041 


Query: 


669 


GATGCTATGTGGGGGCCCCCAGCCTGGG-GTGCAGGGCCCCTGTCAGGGAGATTCCGGGG 

MM MM III MM MM 1 MM MM M MM 

50 


727 



Sbjct: 



1042 GATGGTGTGTACCAGTGCT--G — TGGGTGAGCTGCCCAGCTGTGAGGGCCTGTCTGGGG 1097 



15 



Query: 


728 


Sbjct: 


1098 


Query: 


783 


Sbjct: 


1156 


Query: 


835 


Sbjct: 


1215 



n in in ii i iii i ii mi i inn ii i ii i 

iCCACTGGTGCATGAGGTGAGGGGCACATGGTTCCTGGC — CGGGCTGCACAGCTTCGG 1155 
?TTGCAT-C-AAG-CTGTGCCCAGGAGGACG — CT-C-CTGTGCTGC-TGACCAACACA 834 

III I I III I III III I II II I I I III I II I I I 

Sbjct: 1156 AGP 

10 

Ouerv: 835 C 

iii iii inn iii iii mi ii i iii 

^CTGGGTCAGCAGTT- - TGGACTGGCAGG - TCTACTTC 1249 

Figure 10. BLASTP identity search for the protein of the invention. 

>patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa. {seq id 

NO:65) 



Length = 571 

20 

Plus Strand HSPs: 

Score = 2544 (895.5 bits), Expect = l.le-263, P = l.le-263 
Identities = 476/493 (96%), Positives = 479/493 (97%), Frame = +1 

25 

Ouerv: 19 MLLSSLVSLAGSVYLAWILFFVXYDFCIVCITTYAINVSLMWLSFRKVQEPQGQPKPQEG 198 

M 1 1 MMM I M 1 1 II 1 1 1 1 1 1 M M M 1 1 1 II M Ml M I II i M M I M I • I + I 

MLLSSLVSIAGSVYIAWILFFVXYDFCIVCITTYAINVSLMWLSFRKVQEPQGKAK-RHG 59 
NTVPGEWPWQASVRRQGAHI C SGSLVADTWVLTAAHCFEKAAATELNS — CVRDS 357 

i; 1 1 1 !! II 1 1 1 1 1 1 i 1 1 1 1 1 1 ! Ml 1 : 1 1 1 1 1 1 ll 1 1 1 1 1 1 1 1 1 M I I I 

NTVPGEWPWQASVRRQGAHI CSGSLVADTWLTAAH^ 119 
-APGAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTHTPLCLPQPAHRFPFGASCWAT 534 

+ IIMIMIIIIIIMI IMMIMMIIMM Mill MM II III II IMM Mill 

LSPGAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTHTPLCLPQPAHRFPFGASCWAT 17 9 
GWDQDTSDAPGTLRNLRLRLI SRPTCNCI YNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 714 

IIMMMIMMMMIMMMMMMMMMMIMIMIIII MIM MIMM 

GWDQDTSDAPGTLRNLRLRLI SRPTCNCI YNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 239 
GDSGGPVLCLEPDGHWVQAGI I SFAS SCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 894 

1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 M I 

GDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 299 
PETPEMSDEDSCVACGSLRTAG PQAGAPS PWPWEARLMHQGQLACGGALVSEEAVLTAAH 1074 

1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 I 

PETPEMSDEDSCVACGSLRTAGPQAGAPS PWPWEARLMHQGQLACGGALVSEEAVLTAAH 359 
CFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPL 1254 

I M 1 1 1 1 1 1 1 1 1 1 M 1 1 M I M 1 1 1 1 M 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 II 1 1 11 1 II 

CFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPL 419 

CLPYADHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 1434 

I I I I IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 479 

MVCTSAVGELPSCE 1476 

Illillllllllll 

MVCTSAVGELPSCE 493 

(114.1 bits), Expect = 7.0e-26, P = 7.0e-26 (SEQ IP KOtlll) 
= 91/250 (36%), Positives = 123/250 (49%), Frame = +1 

PQEGNTVPGEWPWQASVRRQGAHICSGSLVADTWVXTAAHCFEKAAATEI^SCVRDSAPG 366 

II I I 1 1 1 + 1 + II I Ml" MIMM! Ill +1 

PQAG- - APSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRP- 378 

51 





Query: 


19 




Sbjct : 


1 


30 


Query: 


199 




Sbjct: 


60 


35 


Query: 


358 




Sbjct: 


120 




Query: 


535 


40 


Sbjct: 


180 




Query: 


715 


45 


Sbjct: 


240 




Query: 


895 




Sbjct: 


300 


50 


Query: 


1075 




Sbjct: 


360 


55 


Query: 


1255 




Sbjct: 


420 




Query: 


1435 


60 


Sbjct: 


480 




Score 


= 324 




Identities = 


65 


Query: 


187 




Sbjct: 


322 



10 



25 



40 



Query: 367 AEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWAT 534 

|| |+ || || I I Kill II I I Mill I I I I I 

Sbjct: 379 -EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYPDHHLPDGERGWVL 437 

Query: 535 GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN--PARPGMLCGGPQPGVQGP 708 

| + + +|+ + + |+ 1+ +11 + I lll+l I 

Sbjct: 438 GRARPGAGI - SSLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAV-GELPS 491 

Query: 709 CQGDSGG PVLCLEPDGHWVQAGI I S FAS SCAQEDAPVLLTNTAAHSSWLQARVQGAAFLA 888 

1 + 1 I I I I I II I l + I I + l I + I 1 + l + + * + I 

Sbjct: 492 CEGLSGAP- LVHEVRGTWFLAGLHS FGDACQGPARPAVFTALPAYEDWVS S - LDWQVYFA 549 



15 Query: 889 QSPETPEMSDEDSCVA 936 

+ II II IN 

Sbjct: 550 EEPE- PE-AEPGSCLA 563 

>patp:Y90291 Human peptidase, HPEP-8 protein sequence - Homo sapiens, 267 aa. 

20 (SEQ ID NO: 66) 

Length = 267 
Plus Strand HSPs: 



Score = 1028 (361.9 bits), Expect = 5.0e-103, P = 5.0e-103 
Identities = 189/189 (100%), Positives = 189/189 (100%), Frame a +1 



Query: 910 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 1089 

30 | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sbjct: 1 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 60 

Query: 1090 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 1269 

1 1 1 1 1 1 1 1 [ 1 1 1 1 1 1 1 1 1 [ i I i 1 1 [ I i 1 1 1 [ 1 1 1 1 1 1 ! E 1 1 1 1 1 ) 1 1 1 1 1 i I i I 

35 Sbjct: 61 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 120 

Query: 1270 DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 1449 

I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sbjct: 121 DHHLPIX5ERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 180 



Query: 1450 AVGELPSCE 1476 

I I I i I 1 I I I 
Sbjct: 181 AVGELPSCE 189 



45 Score = 316 (111.2 bits), Expect = 4.2e-27, P = 4.2e-27 (sbq id moi112) 
Identities = 90/250 (36%), Positives = 122/250 (48%), Frame = +1 

PQEGNTVPGEWPWQASWRQGAHICSGSLVADTVA/T.TAAHCFEKAAATELNSCVRDSAPG 366 

II I I lll+l + II I 1 + 11++ IIIIIMI III +1 

PQAG- -APSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRP- 7 4 

AEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWAT 534 

|| |+ I I II I I l+lll II I I I I I I I III I 

-EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWVL 133 

GWDQDTSDAPGTLRNLRLRLI SRPTCNC I YNQLHQRHLSN- - PARPGMLCGGPQPGVQGP 708 

| + + +|+ + + |+ 1+ +11 + I lll+l I 

GRARPGAGI -SSLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAV-GELPS 187 

CQGDSGG PVLCLE PDGHWVQAGI I SFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLA 888 

|+| || | I I I I 11+ II +1 I + I 1+ 1+ + + * I 

CEGLSGAP - LVHEVRGTWFLAGLHS FGDACQG PARPAVFTALPAYEDWVS S - LDWQVYFA 245 

QSPETPEMSDEDSCVA 936 

+ II II ++ 
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Query: 


187 


50 


Sbjct: 


18 




Query: 


367 


55 


Sbjct: 


75 




Query: 


535 




Sbjct: 


134 


60 


Query: 


709 




Sbjct: 


188 


65 


Query: 


889 




Sbjct: 


246 



Table 11. BLASTN identity search (versus the human SeqCalling database f r the 
Peptidase-like protein of the invention. 

>s3aq: 153687026 Category D: 377 frag (6 5'sig-CG, 204 non-5 • sig-CG, 167 non-CG (SEQ ID 

NO:67) 

5 EST), 1114 bp. 

Length = 1114 

Minus Strand HSPs: 

10 Score = 894 (134.1 bits), Expect = 3.1e-35, P = 3.1e-35 

Identities = 182/186 (97%), Positives = 182/186 (97%), Strand = Minus / Plus 

Ouerv 186 CTTGGGTTGGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 127 

IN | I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIMIMIIII 

15 Sbjct: 413 CTTAGCCTTGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 472 



20 



35 



50 



55 



60 



65 



Ouerv- 126 GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCACGAAGAACAGGATCCA 67 

IMIMMIMMMMIMMMIMIMMMMIMMIMMMIMMIMMM 

Sbjct: 473 GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCACGAAGAACAGGATCCA 532 
Ouerv 66 GGCCAGGTAGACAGAACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 7 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 i 1 1 1 1 1 1 1 1 1 coo 

Sbjct: 533 GGCCAGGTAGACAGAACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 592 



25 Query: 6 CCAGCG 1 

I I I I I I 

Sbjct: 593 CCAGCG 598 
30 >s3aq: 152507187 17 frag (1 5'sig-CG, 7 non-5 ' sig-CG, 9 non-CG EST), 588 bp. (SEQ id 

NO: 68) 

Length = 588 



Plus Strand HSPs: 

Score = 882 (132.3 bits), Expect = 2.1e-34, P = 2.1e-34 
Identities = 178/180 (98%), Positives = 178/180 (98%), Strand = Plus / Plus 

Ouerv 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

40 lllllllllllllllllllllll IIIIIIIIIIIIIIIIIIMI IIIIIIIIM ^ 

Sbjct: 367 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGTTTCTGTCTAC 426 
Ouerv 61 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

lllllllllllllllllllllllllllillllllMIIIIIIIIIIIIIIIIIMIMII AQC 

45 Sbjct: 427 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 486 



Ouerv 121 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 

[ I ] i 1 1 1 1 1 1 1 1 1 1 1 1 1 ! ! M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 i Ml Ml III 

Sbjct: 487 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGGCAA 546 
>s3aq:153485867 Category D: 3 frag (1 non-5' sig-CG, 2 non-CG EST), 612 bp. (SEQ ID 

NO:69) 

Length = 612 
Plus Strand HSPs: 
Score = 785 (117.8 bits), Expect = 1.7e-29, P = 1.7e-29 

Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 
Ouerv 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

1 1 1 1 1 1 11 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 k 1 1 1 1 1 1 1 1 JM 1 1 1 MM I M 

Sbjct: 456 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 515 
Ouerv 61 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

• 1 1 II 1 1 1 1 1 II I II 1 1 1 M II 1 1 1 1 1 1 1 II II I II 1 1 1 1 1 II 1 1 II 1 1 1 II I II I MM I 

Sbjct: 516 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 575 
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Query: 121 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 157 

1 1 1 1 1 1 1 1 1 1 1 1 1 i ! 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 ! I i 1 1 1 1 

Sbjct: 576 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 612 

5 

>s3aq:153485864 Category D: 2 frag (2 non-5 » sig-CG) , 425 bp. (SEQ id NO:70) 
Length =425 

10 Plus Strand HSPs: 

Score = 785 (117.8 bits), Expect = 2.4e-29, P = 2.4e-29 

Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 
15 Query: 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHMI 

Sbjct: 269 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 328 
Query: 61 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTPTGTATCACCACCTAT 120 

20 II Ml II Mill I Mill I IMII llllllll III M MINI II IIMIIM III III 

Sbjct: 329 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 388 
Query: 121 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 157 

I III III II II I II li MM II Ml III II I II II I 

25 Sbjct: 389 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 425 



Table 12. ClustalW alignment of the protein of the invention. 



30 

Information for the ClustalW proteins: 

Accno Common Name Length 

CG50817-05 (SEP id NO:45) novel Peptidase-like protein 

Y4170 4 (SEP id NO:i22) Human PR0351 protein sequence. 571 

Y90291 (SEPIPNP:123) Human peptidase, HPEP-8 protein sequence. 267 

In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
35 and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 
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Table 13. Psort, Signal P and hydropathy results for CG50817-05 



plasma membrane 
5 endoplasmic reticulum (membrane) 

Golgi body 
microbody ( peroxi some ) 



Certainty=0. 6850 (Affirmative) < suco 

Certainty=0. 6400 (Affirmative) < suco 

Certainty=0. 3700 (Affirmative) < suco 

Certainty=0. 1187 (Affirmative) < suco 



INTEGRAL Likelihood = -8.44 Transmembrane 15-31 (1-38) 

10 

Seems to be a Type II (Ncyt Cexo) membrane protein 
Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max. C 36 0.688 0.37 YES 

15 max. Y 36 0.555 0.34 YES 

max. S 10 0.991 0.88 YES 

mean S 1-35 0.875 0.48 YES 

# Most likely cleavage site between pos. 35 and 36: TYA-IN 

20 

SECP13 



A SECP13 nucleic acid and polypeptide according to the invention includes the nucleic 
25 acid sequence (SEQ ID NO:46) and encoded polypeptide sequence (SEQ ID NO:47) of clone 
CG508 17-06 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 
them. This is a related variant of SECP1 1 and SECP12, clones CG50817-04 and CG50817-05. 
Figure 18 illustrates the nucleic acid sequence and amino acid sequences respectively. This 
clone includes a nucleotide sequence (SEQ ID NO:46) of 1200 bp. The nucleotide sequence 
30 includes an open reading frame (ORF) beginning with an ATG initiation codon at nucleotides 
33-35 and ending with a TGA codon at nucleotides 945-947. Putative untranslated regions, if 
any, are found upstream from the initiation codon and downstream from the termination codon. 
The encoded protein having 304 amino acid residues is presented using the one-letter code in 
Figure 18. 

35 The protein encoded by clone CG508 17-06 is predicted by the PSORT program to the 

cytoplasm with a certainty of 0.4500, and does not appear to be a signal protein (see Table 18 
below). 

The DNA sequence and protein sequence for a novel Peptidase-like gene or one of its 
splice forms thus derived is reported here as the invention CG508 17-06. The Genomic clones 
40 having regions with 100% identity to the extended sequence thus obtained were identified by 
BLASTN searches with the extended sequence against human genomic databases. The genomic 
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clone was selected for further analysis because this identity indicates that these clones contain 
the genomic locus for these SeqCalling assemblies. 



The regions defined by all approaches were then manually integrated and manually 
corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
5 the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-06 reported here. When 
necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 
was reiterated to derive the full length sequence. 

Similarities 

10 In a search of sequence databases, it was found, for example, that the nucleic acid 

sequence of this invention has 840 of 842 bases (99%) identical to a gb:z34002 Human PR0351 
nucleotide sequence from Homo sapiens (Tables 14 and 16). The full amino acid sequence of 
the protein of the invention was found to have 278 of 279 amino acid residues (99%) identical to, 
and 278 of 279 amino acid residues (99%) similar to, the 571 amino acid residue Y41704 Human 

15 PR0351 protein from Homo sapiens (Table 15). 

A multiple sequence alignment is given in Table 17, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
20 searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 

determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
by Interpro) at the indicated positions: domain name trypsin at amino acid positions 1 to 62, 
domain name trypsin at amino acid positions 95 to 259. This indicates that the sequence of the 
25 invention has properties similar to those of other proteins known to contain this/these domain(s) 
and similar to the properties of these domains. 

Chromosomal information: 

The Peptidase disclosed in this invention maps to chromosome 16. This information was 

assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 

30 Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
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was executed to derive the chromosomal mapping of the SeqCalling assemblies, Genomic 
clones, literature references and/or EST sequences that were included in the invention. 

Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 
5 Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 
trachea, uterus. This information was derived by determining the tissue sources of the sequences 
10 that were included in the invention including but not limited to SeqCalling sources, Public EST 
sources, and/or RACE sources. 

Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase-like protein are shown in 
Table 18. The results predict that this sequence has no signal peptide and is likely to be localized 
15 in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
nucleic acid whose sequence is provided in Figure 18, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 18 while still encoding a protein that maintains its 
Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 



20 



25 
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mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
provided in Figure 18. The invention also includes a mutant or variant protein any of whose 
5 residues may be changed from the corresponding residue shown in Figure 18 while still encoding 
a protein that maintains its Peptidase-like activities and physiological functions, or a functional 
fragment thereof. In the mutant or variant protein, up to about 1% of the bases may be so 
changed. 

Antibodies 

10 The invention further encompasses antibodies and antibody fragments, such as Fab, 

(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 

15 surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, and map location for the 
Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 

20 Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 
such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody 

25 target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 
gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

The nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications implicated in various diseases and disorders described below and/or 
30 other pathologies. For example, the compositions of the present invention will have efficacy for 
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treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 
anaemia; allergy; asthma; atherosclerosis; Graved disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

Table 14. BLASTN identity search for the nucleic acid of the invention. 

>patn:Z34002 Human PR0351 nucleotide sequence - Homo sapiens, 2365 bp. (SEQID 

NO:71) 

Length =2365 
Plus Strand HSPs: 
Score = 4192 (629.0 bits), Expect = 1.9e-184, P = 1.9e-184 

Identities = 840/842 (99%), Positives = 840/842 (99%), Strand = Plus / Plus 
AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 6 0 

1 1 III M! II II 1 1 II 1 1 1 II II III I II I II I III 1 1 1 II I II I II I II III II 1 1 1 1 1 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 995 
TGCAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 120 

1 1 III I II II II 1 1 1 1! 1 1 II Mill I II I II IIMM 1 1 II II 1 1 ! II! I II II ! 1 1 II 

TGCAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 1055 
ACTGGGTTCAGGCTGGCATCATC AGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 180 

Mill III II II III 1 1 1 Ml II I II III 1 1 1 1 II 1 1 1 1 II I II I II Ml 1 1 1 1 1 1 1 1 1 1 

ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 1115 
TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGC AGGCTCGAGTTCAGGGGGCAG 240 

M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 M I If 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 1175 
CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCT 300 

IMIIMIIMIII IIMMMIIIMIIMIIIIIIIIMMMMMMMIIIMII 

CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATG AGG ACAGCTGTGTAGCCT 1235 
GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 360 

1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 M I M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 1295 
CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 42 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 

CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 1355 
TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 480 

Mill Ml M M Ml I II IIIIIMI III IIMM II II II I Ml MMIM MMMM 

TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 1415 
TGGGG ACCAGACCGG AGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 540 

1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 j 1 1 l i 1 1 1 1 e 1 1 1 1 1 1 r 1 1 1 

TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 1475 



Query: 


1 


Sbjct: 


936 


Query: 


61 


Sbjct: 


996 


Query: 


121 


Sbjct: 


1056 


Query: 


181 


Sbjct: 


1116 


Query: 


241 


Sbjct: 


1176 


Query: 


301 


Sbjct: 


1236 


Query: 


361 


Sbjct: 


1296 


Query: 


421 


Sbjct: 


1356 


Query: 


481 


Sbjct: 


1416 



Query: 541 ACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAG 600 
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Ill INI 



MINI 1MIIMIIIII HUM llllli lllll 1 1 



Sbjct: 


1476 . 


Query: 


601 i 


Sbjct: 


1536 i 


Query: 


661 i 


Sbjct: 


1596 i 


Query: 


721 ' 


Sbjct: 


1656 1 


Query: 


781 < 


Sbjct: 


1716 i 


Query: 


841 ( 


Sbjct: 


1776 ( 


Score 


= 1915 



CCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTG 

II III 1 1 1 Ml II II 1 1 1 II 1 1 II II 1 1 1 lllll MINI I II III III III III 1 1 II 

CCAGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTG 
GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 

MM I II Ml I III II IN I II 1 1 1 1 III II III Ml 1 1 II I III Ml I II III III III 

GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 
TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 

1 1 II I II II 1 1 1 1 II M I II II 1 1 M I M 1 1 1 II II 1 1 II I II I II 1 1 M I M II II I II 

TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 



1535 



660 



1595 



720 



1655 



780 



1715 



GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 840 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 [ If 1 1 1 1 II 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 II I E I 

GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 1775 



Identities 



(287.3 bits), Expect = 1.4e-81, P = 1.4e-81 (sbq id MOtll4) 
635/848 (74%), Positives = 635/848 (74%), Strand = Plus / Plus 



Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



353 CTGGGAGGCC AGGCTGATGCAC- CAGGGACAGCTGGCCTGTGGCGGAGC - -CCTGG — TG 407 

III I MM III II II I Mil II lllll III I MM I 

1508 CTGCTGGCCCAGCCTG-TG-ACACTGGGA--GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 1563 

408 TCA-GAGGAGGCGGTGC-TAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCC-CAGAG 464 

II III III I I II I I III II I I I lllll I II 

1564 TCCTGACCACCACCTGCCTGA-TGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAG 1622 



465 



522 



GAATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT — CC 

II I II I I II I Ml II I III III I II I II 

1623 GAGCAG - GC ATC AG - CTCCCT - C C AGACAGTGCCCGTGAC - CCTC CTGGGGCC TAGGGCC 1678 



523 



582 



TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 

MM I I II II II I I lllll I I II lllll I ill! II 

1679 TGCA- GCCGGCTGCATGCAGC -TCCTGGGGGTGATGGCA- -GCCCTATT - CTGCCGGGGA 1733 

583 AGCCTGTG-ACACTGGGA-GCCAGCCTGCGGCCCCTCTGCCTGC-CCTATGCTGAC-CAC 638 

I MM II I I I I I II MM III I III I III III 

1734 TGG-TGTGTAC - CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT - CTGGGGC AC 1790 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



639 



1791 



697 



CACC — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCA 696 

III III III II III I I lllll II Mill II III III 

CACTGGTGCATGA - GG TGAGGGGCACATGGTTCC TGGCCGGGCT - GC ACAGCTTCGGAGA 1848 



T-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 753 

I I II III III III I I I II I II MM I I I 
1849 TGCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGAC 1906 

754 TGCATGCAGCTCCTGGGGGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGT 812 

II I MM I II II I III III III MM III III II 

1907 TGGGT - CAGCAGTTTGGACTG — G-CAGGTCTACTTC -GCCGAGGAACCAGAGCCCGAG- 



813 



1960 



868 



GCTGTGGGTG - A- GCTGCCCAGCTGTGAG — GCCAACCAACCAGCTGCTGACAGGGGACC 

MM I II I llllli II I II 1 1 1 1 1 1 1 M 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 

1961 GCTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACC 2020 
869 TGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 928 

lllllllllllllll Ml I MIMMIMI Ml Ml llllli III MIMI Ml III II 

2021 TGGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 2079 
929 CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 988 

lllll MM II MM III II MIMI MIMI Mill II MIMI II III III III MM 

2080 CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 2139 



60 





Query : 


989 




Sbjct : 


2140 


5 


Ouerv: 


1049 




Sbjct: 


2200 


10 


Ouerv : 


1109 




Sbjct: 


2260 




Query: 


1169 


15 


Sbjct: 


2320 




Score 


= 267 




Identities = 


20 


Query: 






Sbjct: 


424 


25 


Query: 


334 




Sbjct: 


478 




Query: 


393 


30 


l>DJ Ct . 






Query: 


452 


35 


Sbjct: 


596 




Query: 


506 




Sbjct: 


652 


40 


Query: 


563 




Sbjct: 


710 


45 


Query: 


621 




Sbjct: 


763 




Query: 


678 


50 


Sbjct: 


821 




Query: 


735 


55 


Sbjct: 


877 




Query: 


791 




Sbjct: 


934 


60 


Query: 


849 




Sbjct: 


989 



GAAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGG ACAGGGGTGTCTG 1048 

II Mill I llllllll II III II II I Mill lllllllllllllllll 1 1 III 1 1 III II 

GAAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTG 2199 



TGGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 

II llllllllllll llllll I Mill I lllllllllllllllll Mill Mil II Hill 

TGGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 



I | || I II I I I I II II II II I II II II II II I II II II II II I I II II I II I I II I I I II 



1108 



2259 



2319 



;i ii i ii i ii ii ii ii ii i M ii ii i in 1 1 



(40.1 bits), Expect 



0.0078, P = 0 
Lves = 349/598 



0078 (SEQ ID NO 1 115) 

(58%), Strand = Plus / Plus 

333 



GAGTGA-TGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAG 

I Ml II lllllll III III II II III MM II II 

GCGTGCCTGTGGACAGC-GTG — GCCCC-GGCCCCCCCAAGCCT-CAGGAGGGCAA-CAC 477 
GAGCACCCTCCCCA-TGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGT 392 

II MM I I II M II I II lllllll III I II III I III 

-AGT-CCCTGGCGAGTGGCCCTGGCAGGCCAGTGTGAGGAGGCAAGGAGCCCACATCTGC 535 
GGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC- ATTGGGCG 451 

MM llllllll MM III II lllllllllllllllll I III 

AGCGGCTCCCTGGTGGCAGACACCTGGGTCCTCACTGCTGCCCACTGCTTTGAAAAGGCA 595 
CCAGGCCCCAGAG - - GAATGGAGCGT - AG - GG - CTGGGG ACC AGACCGGAGGAGTG -GGG 505 

III I Mil MM II II II I III I II II III III 



CCTGAAGCAGCTCATCCTGCAT -GGAGCCTACACCCACCCTGAGGGGGGCTACGAC — AT 562 

II I II II I I I II I I HIM I I I I I I II 

ACTCA-GCC-CTGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTGCCCAGGGCCTAT 709 



II II I II III III MM I I III M III I II II I I I 



CTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGT-TCTGG-GACGG- 677 

III II III MUM I I I I III I III I I 

GA-CCC-ACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCCTCC 820 
-GCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGC-CC -GTGACCCTCCTGGGG 734 

ii i iii ii ii inn ii iii i iii ii i mill i i 



II I MUM I I I II I II I I II II II II II II 



II I II III I II I I I I II I III I M I MM III 

- CCAGCGACACCTGTCCAACCCGGCCCG - GCCTGGGATGCTATG - TGGG - GGCCC - CC AG 988 



II I I II HIM MM I 



>patn:A37664 Human peptidase, HPEP-8 coding sequence - Homo sapiens, 1661 bp (SEQ ID 
65 NO: 72) 

Length = 1661 



70 



Plus Strand HSPs : 



61 



Score = 3831 (574.8 bits), Expect = 5.6e-168, P = 5.6e-168 

Identities = 767/768 (99%), Positives = 767/768 (99%), Strand = Plus / Plus 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



Query: 


75 


Sbjct: 


320 


Query: 


135 


Sbjct: 


380 


Query: 


195 


Sbjct: 


440 


Query: 


255 


Sbjct: 


500 


Query: 


315 


Sbjct: 


560 


Query: 


375 


Sbjct: 


620 


Query: 


435 


Sbjct: 


680 


Query: 


495 


Sbjct: 


740 


Query: 


555 


Sbjct: 


800 


Query: 


615 


Sbjct: 


860 


Query: 


675 


Sbjct: 


920 


Query: 


735 


Sbjct: 


980 


Query: 


795 


Sbjct: 


1040 


Score 


= 193! 


Identities : 


Query: 


.353 


Sbjct: 


818 


Query: 


410 


Sbjct: 


874 


Query: 


466 


Sbjct: 


934 



CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 

illll IIIIMIIIIIIIIIIIIIIIIII IIIIMIIIMIMM IMIMMM Mill 

CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 
GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 

Mill II II 1 1 1 II II II II 1 1 1 II Nil IM1II II III II II I 1 1 II 1 1 1 1 1 II 1 1 1 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 
ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 

II II II II 1 1 III II II 1 1 1 II II I! II II MM II Mill II I 1 1 1 1 1 1 1 1 1 1 1 1 ! I 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 



134 



379 



194 



439 



254 



499 



II 1 1 1 II II 1 1 II 1 1 1 II I M II II 1 1 II II I Ml 11 



Ml 1 1 II I 



ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 

II Ml II II II II 1 1 1 IMM 1 1 III I M M MMM I MM II I 1 1 1 1 1 II II I 1 1 1 1 

ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 
CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 

Illll II M 11 1 1 III 1 1 Ml 1 1 Ml I II llllllll Mill 1 1 1 I II 1 1 1 1 1 1 1 I M I 

CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 



CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 

Illll II II III IMM MM Ml I II II MM II III II I II 1 1 Illll III Illll 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 



374 



619 



434 



679 



494 



739 



554 



GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 

II I MM Ml III 1 1! II I II II 1 1 II II II I M I II MM I II I I I 1 1 1 II II I III II 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 799 
TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 614 

1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 M 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 M M 1 1 1 1 1 M 1 1 M I 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 859 
CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 674 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 919 



CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 

Mill II IMMIMI II MM Illll Illll IIIIMIIM llllllll II III Illll 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 



734 



979 



794 



CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 

III MM II 1 1 M 1 1 Ml Ml II III Ml MMM II II II I II 1 1 II 1 1 Ml 1 1 II I II 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1039 
GGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCC 842 

MINIMI IIIIMMMIIIIIIIIIIIIMIIIIIMIIMI I I 

GGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGC 1087 



Ml I MM Ml II II I MM II Illll III I 



a Plus / Plus 
-CCTGGTGTC 409 

MM I 



II III II I I II II II I I I Illll I III 



I I II I I II 



Mill 



III III I II I Ml 



62 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



70 



Query: 


524 


Sbjct: 


990 


Query: 


584 


Sbjct: 


1045 


Query: 


640 


Sbjct: 


1102 


Query: 


698 


Sbjct: 


1160 


Query: 


755 


Sbjct: 


1218 


Query: 


814 


Sbjct: 


1272 


Query: 


870 


Sbjct: 


1332 


Query: 


930 


Sbjct: 


1391 


Query: 


990 


Sbjct: 


1451 


Query: 


1050 


Sbjct: 


1511 


Query: 


1110 


Sbjct: 


1571 


Query: 


1170 


Sbjct: 


1631 


Score 


= 559 


Identities 


Query: 


1 


Sbjct: 


93 


Query: 


61 


Sbjct: 


153 


Query: 


118 


Sbjct: 


211 


Query: 


174 


Sbjct: 


269 


Query: 


227 



III I I II II II I I Mill I I II Mill I MM II 

GCA-GCCGGCTGC ATGC AGC - TCCTGGGGGTGATGGCA- -GCCCTATT-CTGCCGGGGAT 1044 



i mi ii i i i i i ii mi m i in i in mi 



ii in in ii in i i inn ii inn ii m i i n 

ACTGGTGCATGA-GGTGAGGGGC ACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 1159 

-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 754 
I II II I III III I I I II I II mi I I II 



i i mi i ii ii i iii iii in mi iii i i i ii i 

GGGT- CAGCAGTTTGGACTG- -G - CAGGTCTACTTC -GCCGAGGAACCAGAGCCCGAG -G 1271 
CTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 869 

m i ii i mm ii i in i in ii ii ii n ii i ii i ii m ii n 

CTGAGCCTGG AAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGAC AGGGGACCT 1331 
GGCCATTCTCAGGAACAAGAGAATGC AGGCAGGC AAATGGCATTACTGCCCCTGTCCTCC 929 

1 1 1 i 1 1 1 1 1 1 1 1 1 1 mini iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

GGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1390 
CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 989 

imnn ii i iii nn i mi i ii mi mi nm mi imimimi n ii 

CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1450 
AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGAC AGGGGTGTCTGT 1049 

II I II 1 1 1 1 1 (I I M I i I I II I II I M Ii 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 II II II I II II 

AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 
GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1109 

iiniimimmimiimmmiiimnnmiiimmiiiim 

GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 
CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1169 

IMIMIMI I IMIMIMI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II 1 1 1 1 ! 1 1 1 1 

CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1630 
CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1200 

I ] F 1 1 1 J 1 1 1 1 1 1 1 F 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 

CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 

(83.9 bits), Expect = 8.2e-17, P = 8.2e-17 (sbq id no 1 117) 
= 609/1017 (59%), Positives = 609/1017 (59%), Strand = Plus / Plus 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 60 

II I M !l 1 1 II ! 1 1 II II II 1 1 1 1 1 1 ill II II III I !l III 1 1 1 1 1 ll I III M ! 1 1 1 1 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 152 
TGCAGGGCCCCTGTCAGGGA-GATTCCGGGG-GCCCTGT-GCTGTGCCTCGAGCCTGACG 117 

II Ml II IIIIM lllll Ml I I I I II I III I I 



II Mill I II I I I III III 



-CTTTGCATCA-AGCTGTGCCCAGGAGGAC 173 
II I I lllll III III 



II II I MM I I II III II 



II III IIIIM II I III III 



lllll 

63 



I I I III lllll I 



I III I II I I II I II 



Sbjct: 



325 AGATTCCGGGGGCC-CTGTGCTGTGCCTCGAGCCTGACGGACACTGG-GTTCAG-GCTG- 380 



Query: 284 GG AC AGCTGTGTAGCCTGTGGATCCT - - TG AGGACAGCAGGTC - C - CCAG - GC AGGAGCA 338 

I II I I III I II III III III III I I II I II I II 

Sbjct: 381 -G-CATCA-TC-AGCTT-TGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTG-A 434 

Query: 339 CCCTCCCCATGGCCCTGGGAGG-CCAGGCTG-ATGCACCAGGGACAGCTGGCCTGTGGCG 396 

II | | I I II II II Mill I II I II III Ml I I 

Sbjct: 435 CCAACAC-A-GCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTT-CAGGGGGCAGCTTTCC 491 

Query: 397 GAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGG 456 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Sbjct: 492 TGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTG-GA 550 

Query: 457 CCCCAGAGGAATGGAG — CGTAGGGCTGGGG-ACCAGACCGGAGGAGTGGGGCCTGAAGC 513 

II Mill I II I III II I III II II III II 

Sbjct: 551 TCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAG- 609 

Query: 514 AGCTCATCCTGC ATGGAGC - CTACACCCACCCTGAGGGGGGCTA- C - GACATGGCCCTCC 570 

III II I II III III I III I I I II III I I 

Sbjct: 610 -GCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGC 668 

Query: 571 TG - CTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTAT 629 

I 1 1 1 1 1 1 1 1 1 1 III II Mil Mill I II II II M 

Sbjct: 669 TAACTGCTG-CCCA--CTGCTTCATTGGGCGCCAGGCCCCAGAGGAA-TGGA-GCG-TAG 722 
Query: 630 G- CTGACCACCAC - CTGCCTGA- TGGGGAGCGTGGCTGGGT- TCTGGGACGGGCCCGCCC 685 

I III MM II II Mill I II I II I I II II I 

Sbjct: 723 GGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTAC- 781 
Query: 686 AGG AGC AGGC ATC AGCTCC - CTC C AG AC AGTGCC CGTGACCCTCCTGGG GCC TAGGG 741 

I II I I II II I MM Mill II MM MM I 

Sbjct: 782 ACC - - C ACCC - TGAGGGGGGCTAC -G ACATGGCCC - TCCTGC TGCTGGCCCAGCCTGTGA 836 
Query: 742 C -CTGC - AGCCGGC- TGC ATGCAGCTCCTGGGGGTGATG -GCAG- CC -CTATTCTGCCGG 795 

I Ml MM II III II III II I I II I II I I Mill I 

Sbjct: 837 CACTGGGAGCCAGCCTGCG -GCCCCTC- TGCCTGCCCTATGCTGACCACCAC - CTGCCTG 893 
Query: 796 GGATGGTGTGTACCAGTGCTGTGGGT -GAGCT -GCCCAGCTGTGAGGCCAACCAACCAGC 853 

II I II I I I I MM I II llllll I MM I II I I 

Sbjct: 894 ATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGC-AGGC — ATCAGCTCCC 950 
Query: 854 TGCTGACAGGGGACCTGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAA-ATGGCAT 912 

I I Mill I I II II I II II I II I Hill I III III II 

Sbjct: 951 TCCAG AC AGTGCCCGTGACCCTC CTGGGGC - CTAGGGCCTGC AGCC - GGCTGC ATG - C AG 1007 
Query: 913 -TACTGCCCCTG- TC -CTCCCC - ACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCC 968 

I Ml II I I III I III I I I III I Mill II I 

Sbjct: 1008 CTCCTGGGGGTGATGGCAGCCCTATTCTGCCG-G - G - GATGGTGTGTACC AGTGCTGTGG 1064 
Query: 969 CAGAAGCCCAGCAGCTGTGGGAAGGAACCTGCCTGGGGC- -CACAGGTGC 1016 

II II II MM II I II MM I III III III Mill 

Sbjct: 1065 GTGA -GCTGCCC AGCTGTGAG — GG - - CCTGTCTGGGGC ACC ACTGGTGC 1109 



Table 15. BLASTP identity search for the protein of the invention. 



>patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa. (seq id 

NO:73) 

Length =571 

Plus Strand HSPs: 

Score = 1514 (533.0 bits), Expect = 1.6e-154, P = 1.6e-154 
Identities = 278/279 (99%), Positives = 278/279 (99%), Frame = +3 

Query: 3 RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPV 182 

64 



Sbjct: 



IMIIII Mill Mill! Mill MMII llllll llllll llllll MUM Mill! I 

215 RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPV 274 



LLTNTAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEA 362 

Mill II II I II IIIMMII II I llllll Mill llllll I Mill llllll IMIIII 

LLTNTAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEA 334 
RLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTH 542 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 i I ! 1 1 1 1 1 1 1 1 1 1 1 1 1 f M 1 1 1 1 1 1 1 1 

10 Sbjct: 335 RLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTH 394 
PEGGYDMALLLIAQPVTLGASLRPLCLPYADHHLPDGERGWVLGRARPGAGISSLQTVPV 722 

IIIIIIMIIIIIIIIII Mill MMII IMIMIMI llllll III IMIMIMI 

PEGGYDMALLLLAQPVTLGASLRPLCLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPV 454 



15 



20 Score = 225 (79.2 bits). Expect = 4.6e-15, P = 4.6e-15 (3EQ id ho i 118) 
71/203 (34%) , Positives = 95/203 (46%) , Frame = +3 



25 



30 



35 



45 



50 



Query: 


183 


Sbjct: 


275 


Query: 


363 


Sbjct: 


335 


Query: 


543 


Sbjct: 


395 


Query: 


723 


Sbjct: 


455 


Score 


= 225 


Identities 


Query: 


339 


Sbjct: 


63 


Query: 


495 


Sbjct: 


123 


Query: 


669 


Sbjct: 


179 


Query: 


834 


Sbjct: 


234 


Score 


= 125 


Identities : 


Query: 


15 


Sbjct: 


474 


Query: 


195 


Sbjct: 


532 



TLLGPRACSRLHAAPGGDGSPILPGMVCTSAVGELPSCE 839 

I IN II M 1 1 Jl II I II 1 1 1 1 1 1 1 II 1 1 1 II II I II I 

TLLGPRACSRLHAAPGGDGSPILPGMVCTSAVGELPSCE 493 



PS PWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPE - - EWSVGLGT RP 494 

I | | |*| ♦ II I Ml" llllllll I I III 11+ I 



- - EEWGLKQLI LHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWV 668 
III* II III I hill II I I Mill III I 



JLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAVGELPS 833 

•1+ + + 1+ l + + H + I 111*1 II 
'LRNLRLRLISRPTCNCIYNQLHQRHLSN- - PARPGMLCG GPQPG 233 



II I + I ♦+ +1 



(44.0 bits), Expect = 0.00067, P = 0.00067 (sbq id »Oill9) 
40 Identities = 32/95 (33%), Positives = 47/95 (49%), Frame = +3 



lll+l I l+l II I I I II ll + II + l I + I 

>ILPGMVCTSAV-GELPSCEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTi 

^AHSSWLQARVQGAAFLAQSPETPEMSDEDSCVA 299 
1+ 1+ + + + 1+ II II ++ ll+l 



>patp:Y90291 Human peptidase, HPEP-8 protein sequence - Homo sapiens, 267 aa. 

(SEQ ID NO: 74) 



55 Length =267 



Plus Strand HSPs: 

60 Score = 1028 (361.9 bits), Expect = 5.0e-103, P = 5.0e-103 

Identities = 189/189 (100%), Positives = 189/189 (100%), Frame = +3 

Query: 273 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 452 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

65 Sbjct: 1 MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 60 

Query: 453 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 632 

II I II II II I II II I III I II II llllllll III II II I I II I II II II Mil II II II I 

Sbjct: 61 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 120 



70 



65 



Query: 633 DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 812 

II III II III III I Mil I II 1 1 II II II II III II III III II M Ml 1 1 1 Ml III I 

Sbjct: 121 DHHLPIX3ERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 180 



Query: 813 AVGELPSCE 839 

MINIMI 
Sbjct: 181 AVGELPSCE 189 



Score 


= 125 


(44.0 bits). Expect = 0.00016, P = 0.00016 <seq id no i 120) 




Identities » 


= 32/95 (33%), Positives = 47/95 (49%), Frame = +3 




Query: 


15 


NPARPGMLCGGPQPGVQG PCQGDSGGPVLCLEPDGHWVQAGI I SFAS SCAQEDAPVXLTN 


194 






+1 MM 1 1*1 II 1 1 1 1 1 11+ II +1 1*1 




Sbjct: 


170 


SPILPGMVCTSAV-GELPSCEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTA 


227 


Query: 


195 


TAAHS SWLQARVQGAAFLAQS PET PEMSDEDSCVA 299 








|+ | + + ♦ ♦ | + M II ll+l 




Sbjct: 


228 


LPAYEDWVSS-LDWQVYFAEEPE-PE-AEPGSCLA 259 
* 





Table 16. BLASTN identity search (versus the human SeqCalling database for the 
Peptidase-like protein of the invention. 

>s3aq:132854740 Category D: 12 frag (12 non-5 ' sig-CG) , 636 bp. (SEQ ID NO: 75) 
Length =636 

Minus Strand HSPs: 



Score = 1423 (213.5 bits), Expect = 7.0e-59, P = 7.0e-59 

Identities = 313/343 (91%), Positives = 313/343 (91%), Strand = Minus / Plus 



Query: 


754 


AGCCGGCTGCAG-GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 


696 


Sbjct: 


295 


III llllll 1 III 1 III II II II III 1 Mil 

AGCTGGCTGCCCCGGCCT -GCAGGTTGGATGGAC AGCAGCCCTGGCCCT -GTGCCCACCT 


352 


Query: 


695 


GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 


636 




IIIMMMMMMMIMMMMMMMMMMIIMMMMIIMMIIMI 




Sbjct: 


353 


ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 


412 


Query: 


635 


GTCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 


576 




Mill 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 




Sbjct: 


413 


GTCAGGATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 


472 


Query: 


575 


CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 


516 




1 1 Ml 1 1 II II 1 1 II 1 INI II 1 1 II 1 II 1 1 II 1 1 1 1 IN Mill II M 1 1 II II II 1 1 1 




Sbjct: 


473 


CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 


532 


Query: 


515 


CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 


456 




MIMIM MM IMIMI 1 1 1 1 II III MM 1 1 1 III MM MM Ml 1 II lllllll 




Sbjct: 


533 


CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 


592 


Query: 


455 


CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 412 






IMMIMIIIMMMIMIMMMM IIMIIIIIIIIMI 




Sbjct: 


593 


CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 636 




Score 


= 757 


(113.6 bits). Expect = 8.5e-29, P = 8.5e-29 (SEQ id no i 121) 




Identities 


= 165/179 (92%), Positives = 165/179 (92%), Strand = Minus / Plus 


Query: 


869 


AGGTCCCCTGTCAGCAGCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT 


810 






MM III 1 MM II 1 II M II M II II MM M Ml 1 M 1 1 1 M II 




Sbjct: 


105 


AGGTAAGGTGTGGGGGCCTGG — GGCTCACCTCACAGCTGGGCAGCTCACCCACAGCACT 


162 


Query: 


809 


GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 


750 




II Ml 1 1 1 Ml 1 1 1 1 M 1 1 1 M 1 III 1 1 1 M 1 1 M 1 1 M II 1 1 Ml 1 III 1 1 1 1 M 1 1 M 




Sbjct: 


163 


GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 


222 


Query: 


749 


GCTGC AGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 691 
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1 1 II II M II II II I il 1 1 1 1 1 1 1 M MINIM 1 1 MMMI 1 1 1 II 1 1 1 Ml MM 

Sbjct: 223 GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 281 

5 >s3aq: 134913963 Category E: 1 frag (1 non-CG EST) , 415 bp. (SEQ id NO:76) 
Length =415 

Plus Strand HSPs: 

10 Score = 297 (44.6 bits), Expect = 8.0e-07, P = 8.0e-07 

Identities = 61/63 (96%), Positives = 61/63 (96%), Strand = Plus / Plus 

Query: 1138 TTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 1197 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

15 Sbjct: 10 TTGGTGTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 69 

Query: 1198 ATT 1200 
III 

Sbjct: 70 ATT 72 

20 



Table 17. ClustalW alignment of the protein of the invention. 



25 

Information for the ClustalW proteins: 

Accno Common Name Length 

CG508 1 7-0 6 (SEOiDNO:47) novel Peptidase-like protein 

Y4170 4 (SEOiPNO:i22) Human PR03 51 protein sequence. 571 

Y90291 (SEP id NO: 123) Human peptidase, HPEP-8 protein sequence. 267 

In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
30 and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 

Table 18. Psort, Signal P and hydropathy results for CG50817-06 

cytoplasm Certainty=0 . 4500 (Affirmative) < suco 

35 microbody (peroxisome) Certainty=0 . 3000 (Affirmative) < suco 

lysosome (lumen) Certainty=0 . 2334 (Affirmative) < suco 

mitochondrial matrix space Certainty=0 . 1000 (Affirmative) < suco 

40 Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
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max. C 45 0.253 0.37 NO 

max. Y 17 0.064 0.34 NO 

max. S 68 0.536 0.88 NO 

mean S 1-16 0.130 0.48 NO 

5 SECP14 

A SECP14 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:48) and encoded polypeptide sequence (SEQ ID NO:49) of clone 
CG50817-06 directed toward novel serine protease-like proteins and nucleic acids encoding 
them. Figure 19 illustrates the nucleic acid sequence and amino acid sequences respectively. 

10 This clone includes a nucleotide sequence (SEQ ID NO:48) of 1214 bp. The nucleotide 
sequence includes an open reading frame (ORF) beginning with an ATG initiation codon at 
nucleotides 31-33 and ending at nucleotides 1 186-1 188. Putative untranslated regions, if any, 
are found upstream from the initiation codon and downstream from the termination codon. The 
encoded protein having 385 amino acid residues is presented using the one-letter code in Figure 

15 19. The protein encoded by clone CG5 1099-03 is predicted by the PSORT program to the 

outside of the membrane with a certainty of 0.5804, and appears to be a signal protein (see Table 
22 below). 

The serine protease tryptase (ECNr. 3.4. 21.59), which is almost exclusively 
expressed in mast cells, is released by mast cell degranulation in an enzymatically active form 

20 together with other mediators, e.g. histamine, into the extracellular space and the circulation. The 
capability of the enzyme to directly stimulate several cell types as well as to cleave polypeptide 
hormones and to activate pro-enzymes suggests a role for tryptase in inflammatory and tissue- 
remodeling processes. Therefore, in the skin, a role of tryptase is suggested not only in 
mastocytosis and immediate type hypersensitivity reactions, but also in other inflammatory 

25 diseases, degenerative or neoplastic conditions as well as in wound healing, where an 

accumulation and/or activation of mast cells is found. Extracellular tryptase may be superior to 
histamine as a parameter for the onset and course of immediate type reactions and as an indicator 
for the activation of mast cells in other conditions. Its absence during histamine-liberating 
reactions may suggest basophil activation. In addition, tryptase has been shown to be a sensitive 

30 and specific marker for the localization of mast cells in tissues (Ludolf-Hauser et al., 1999, 
Hautarzt 50:556-61). 

Tryptases are stored in abundance in the secretory granules of mouse (McNeil et al, 1992, 
Proc. Natl. Acad. Sci. U. S. A. 89, 1 1 174-1 1 178; Johnson, D. A., and Barton, G., 1992, Protein 
Sci. 1, 370-377), and human (Vanderslice et al., 1990, Proc. Natl. Acad. Sci. U. S. A. 87, 3811- 
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3815) mast cells (MCs). In humans, the four homologous tryptases (designated tryptases I, D7 , 
HI, and ) that have been cloned reside at a complex on chromosome 16 (Pallaoro et al., 1999, J. 
Biol. Chem. 274, 3355-3362). Although only two tryptases (designated mouse MC protease 
(mMCP) 6 and mMCP-7) have been identified so far in the mouse, their genes reside -1.2 
5 centimorgans away from each other on the syntenic region of mouse chromosome 17 (Gurish et 
al., 1994, Mammal. Genome 5, 656-657). Despite the chromosomal clustering of their genes, 
these mouse tryptases are differentially regulated in vivo (Reynolds et al., 1990, Proc. Natl. 
Acad. Sci. U. S. A. 87, 3230-3234) and in vitro (Reynolds et al., 1991, J. Biol. Chem. 266, 3847- 
3853; McNeil et al, 1992, Proc. Natl. Acad. Sci. U. S. A. 89, 1 1 174-1 1 178) at the levels of gene 
10 transcription (Morri et al., 1996, Blood 88, 2488-2494) and mRNA stability. 

All known mouse and human tryptases in this family are initially translated as zymogens. 
They possess an ~20-residue hydrophobic signal peptide which is presumed to be removed in the 
endoplasmic reticulum immediately after the translated zymogen is translocated into the lumen. 
They also possess an ~10-residue propeptide preceding the mature portion of the enzyme which 
15 consists of -245 amino acids. Although tryptases undergo variable N-linked glycosylation during 
their biosynthesis (Ghidyal et al., 1994, J. Immunol. 153, 2624-2630), the current members of 
the family appear to be targeted to the secretory granule by a serglycin proteoglycan-dependent 
mechanism (Ghidyal et al., 1996, J. Exp. Med. 184, 1061-1073) rather than by a Man-P04- 
dependent mechanism as are classical lysosomal enzymes. 

20 Recently, Wong et al. (1999, J Biol Chem 274, 30784-30793) described a novel mouse 

gene, and its human ortholog, which encode an unusual transmembrane tryptase (TMT). 
Comparative structural studies indicated that the putative transmembrane tryptase (TMT) 
possesses a unique substrate-binding cleft. As assessed by RNA blot analyses, mTMT is 
expressed in mice in both strain- and tissue-dependent manners. Thus, different transcriptional 

25 and/or post-transcriptional mechanisms are used to control the expression of mTMT in vivo. 
Analysis of the corresponding tryptase locus in the human genome resulted in the isolation and 
characterization of the hTMT gene. The hTMT transcript is expressed in numerous tissues and is 
also translated. Analysis of the tryptase family of genes in mice and humans now indicates that a 
primordial serine protease gene duplicated early and often during the evolution of mammals to 

30 generate a panel of homologous tryptases in each species that differ in their tissue expression, 
substrate specificities, and physical properties. 
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Similarities 



In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1213 of 1213 bases (100%) identical to a gb:GENBANK- 
5 ID:AX079882|acc:AX079882.1 mRNA from Homo sapiens (Sequence 13 from Patent 

WO0105971) (See Table 19). The full amino acid sequence of the protein of the invention was 
found to have 385 of 385 amino acid residues (100%) identical to, and 385 of 385 amino acid 
residues (100%) similar to, the 385 amino acid residue ptnr:SPTREMBL-ACC:Q9UI38 protein 
from Homo sapiens (Human) (TESTES-SPECIHC PROTEIN TSP50)(See Table 20). 

10 A multiple sequence alignment is given in Table 21, with the protein of the invention 

being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
15 identified by the Interpro domain accession number. Significant domains are summarized below; 



Model Domain seq-f seq-t hmm-f hmm-t score E-value 

20 trypsin 1/2 118 297 6 199 104.4 2.6e-32 

trypsin 2/2 313 353 215 259 35.9 1.6e-10 

The catalytic activity of the serine proteases from the trypsin family is provided by a 
charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which 
itself is hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
25 histidine residues are well conserved in this family of proteases (Sprang et al., 1987, Science 
237:905-909). A partial list of proteases known to belong to the trypsin family is shown below. 

- Acrosin. 

- Blood coagulation factors VII, DC, X, XI and XII, thrombin, plasminogen, 
and protein C. 

30 - Cathepsin G. 

- Chymotrypsins. 

- Complement components Clr, Cls, C2 t and complement factors B, D and I. 

- Complement-activating component of RA-reactive factor. 

- Cytotoxic cell proteases (granzymes A to H). 
35 - Duodenase I. 

- Elastases 1, 2, 3A, 3B (protease E), leukocyte (medullasin). 
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- Enterokinase (EC 3.4.21.9) (enteropeptidase). 

- Hepatocyte growth factor activator. 

- Hepsin. 

- Glandular (tissue) kallikreins (including EGF-binding protein types A, B, 

5 andC, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and 
tonin). 

- Plasma kallikrein. 

- Mast cell proteases (MCP) 1 (chymase) to 8. 

- Myeloblasts (proteinase 3) (Wegener's autoantigen). 
10 - Plasminogen activators (urokinase-type, and tissue-type). 

- Trypsins I, II, III, and IV. 

- Tryptases. 

- Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, 
and protein C activator. 

15 -Collagenase from common cattle grub and collagenolytic protease from 
Atlantic sand fiddler crab. 

- Apolipoprotein(a). 

- Blood fluke cercarial protease. 

- Drosophila trypsin like proteases: alpha, easter, snake-locus. 
20 - Drosophila protease stubble (gene sb). 

- Major mite fecal allergen Der p III. 

All the above proteins belong to family SI in the classification of peptidases. 



This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 



25 Chromosomal information: 



The Serine Protease-like gene disclosed in this invention maps to chromosome 3. This 
assignment was made using mapping information associated with genomic clones, public genes 
and ESTs sharing sequence identity with the disclosed sequence and CuraGen Corporation's 
Electronic Northern bioinformatic tool. 



30 Tissue expression 



The Serine Protease-like gene disclosed in this invention is expressed in at least the 
following tissues: adipose, adrenal gland, thyroid, brain, heart, skeletal muscle, bone marrow, 
colon, bladder, liver, lung, mammary gland, placenta, testis. Expression information was derived 
from the tissue sources of the sequences that were included in the derivation of the sequence of 
35 CuraGen Acc. No. CG5 1099-03 .The sequence is predicted to be expressed in the following 
tissues because of the expression pattern of (GENBANK-ID: gb:GENBANK- 
E):AX079882|acc:AX079882.1) a closely related Sequence 13 from Patent WO0105971 
homolog in species Homo sapiens: testis. 
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Cellular Localizati n and Sorting 



The PSORT, SignalP and hydropathy profile for the Serine Protease-like protein are 
shown in Table 22. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.5804. The signal peptide is predicted by SignalP to 
5 be cleaved at amino acid 39 and 40: CWG-AG. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Serine Protease-like protein includes 
the nucleic acid whose sequence is provided in Figure 19, or a fragment thereof. The invention 
also includes a mutant or variant nucleic acid any of whose bases may be changed from the 

10 corresponding base shown in Figure 19 while still encoding a protein that maintains its Serine 
Protease-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to the sequence of 
CuraGen Acc. No. CG5 1099-03, including nucleic acid fragments that are complementary to any 
of the nucleic acids just described. The invention additionally includes nucleic acids or nucleic 

15 acid fragments, or complements thereto, whose structures include chemical modifications. Such 
modifications include, by way of non-limiting example, modified bases, and nucleic acids whose 
sugar phosphate backbones are modified or derivatized. These modifications are carried out at 
least in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. In 

20 the mutant or variant nucleic acids, and their complements, up to about 0% of the bases may be 
so changed. 

The novel protein of the invention includes the Serine Protease-like protein whose 
sequence is provided in Figure 19. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 19 while still 
25 encoding a protein that maintains its Serine Protease-like activities and physiological functions, 
or a functional fragment thereof. In the mutant or variant protein, up to about 0% of the amino 
acid residues may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
30 (Fab) 2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
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invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

5 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Serine Protease-like 
protein may have important structural and/or physiological functions characteristic of the 
Trypsin family. Therefore, the nucleic acids and proteins of the invention are useful in potential 

10 diagnostic and therapeutic applications and as a research tool. These include serving as a specific 
or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. These also include potential 
therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small molecule 
drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), 

15 (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent promoting 
tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: adrenoleukodystrophy , 

20 congenital adrenal hyperplasia, hyperthyroidism, hypothyroidism, Von Hippel-Lindau (VHL) 
syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalcemia, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 
neurodegeneration, cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, 

25 aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, 
pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, 
scleroderma, obesity, transplantation, muscular dystrophy, myasthenia gravis, hemophilia, 
hypercoagulation, idiopathic thrombocytopenic purpura, autoimmune disease, allergies, 
immunodeficiencies, graft versus host disease, cirrhosis, systemic lupus erythematosus, asthma, 

30 emphysema, ARDS, fertility, cancer, as well as other diseases, disorders and conditions. 
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These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 



Table 19. BLASTN search using CuraGen Acc. No. CG51 099-03. 

>gb:GENBANK-ID:AX079882|acc:AX079882.1 Sequence 13 from Patent WO0105971 - Homo 
sapiens, 1359 bp. (seq id NO:77) 
Length = 1359 

Plus Strand HSPs: 



Score = 6065 (910.0 bits), Expect = 4.8e-268, P = 4.8e-268 

Identities = 1213/1213 (100%), Positives = 1213/1213 (100%), Strand = Plus / 



Plus 



Query: 


1 


Sbjct : 


15 


Query: 


61 


Sbjct : 


75 


Query : 


121 


Sbjct: 


135 


Query: 


181 


Sbjct: 


195 


Query: 


241 


Sbjct: 


255 


Query: 


301 


Sbjct: 


315 


Query: 


361 


Sbjct: 


375 


Query: 


421 


Sbjct: 


435 


Query: 


481 


Sbjct: 


495 


Query: 


541 


Sbjct: 


555 


Query: 


601 


Sbjct: 


615 


Query: 


661 


Sbjct: 


675 


Query: 


721 


Sbjct: 


735 



CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 6 0 

1 1 II i I i II Mil ( I IN II II I! I II I II III II 1 1 1 1 II I II Nil 1 1 1 1 1 1 1 II I 

CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 7 4 
GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 120 

1 1 II II I II 1 1 II I III I II M II I II Ml III II I II I II 1 1 1 II II 1 1 II I II II 1 1 

GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 134 
TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 180 

1 1 1 1 1 1 1 1 1 1 1 I II II 1 1 II li IN II 1 1 II 1 1 1 1 1 Ml H I II II II 1 1 1 1 1 1 M 1 1 

TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 194 
GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 240 

ii i inn ii ii iii ii iiiiiiiiiii ii ii ii i ii ii I ii inn ii in ii ii i ii i 

GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 254 
CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 300 

II III II I MM IMIIMI II III II III MIM MM III II II II I II 1 1 1 11 1 1 

CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 314 
TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 360 

MIMI II IIMIIIIIIMIIMMM II 1 1 1 1 1 1 1 II Ml 1 1 1 II 1 1 M M 1 1 II M I 

TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 374 
GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 420 

M M I! II IMMMMMM M III II III MIM MM MMMI III III M III II 

GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 434 
CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACT 480 

I MM Ml III 1 1 1 MM Mill Ml Ml II II I II I II I II I M II II III II I MM I 

CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACT 494 



II II I III II II I M MM MM Mill I II I II I II II 1 1 1 I II II II II I II I 



III 



II 



1 1 1 1 M 1 1 1 1 1 1 1 III III 



M 1 1 MM 1 1 II II III II I M M II 1 1 1 II I III I 



1 1 1 1 1 II 1 1 1 1 1 Ml Ml M II 1 1 II II MM I 



720 



AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 

1 1 II II M M I M I II II I II II MM II M Ml II Ml Ml M M 1 1 1 1 II 1 1 1 1 1 1 1 1 

AAGCTC AAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCC ATCTGCCTGCCTGGCACG 734 
GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 780 

Ml II III III II M IMMMIMI MM II II III IMIMM II IMMMMMM 

GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 794 
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Query: 


781 


Sbjct: 


795 


Query: 


841 


Sbjct: 


855 


Query: 


901 


Sbjct: 


915 


Query: 


961 


Sbjct: 


975 


Query: 


1021 


Sbjct: 


1035 


Query: 


1081 


Sbjct: 


1095 


Query: 


1141 


Sbjct: 


1155 


Query: 


1201 


Sbjct: 


1215 


Table 


20. 



GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 840 

1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 

GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 854 
AAAGAGTGTGACAATTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 900 

Mill II III llllllll Mill III II Ml II 1 1 1 1 M ! I Ml III II I II 1 1 1 1 1 1 II 

AAAGAGTGTG ACAATTTCTACCACAACTTCACCAAAATCCCC ACTCTGGTTC AGATCATC 914 
AAGTCCCAGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 960 

II I II II II I II I II I II III III 1 1 1 1 1 1 1 1 II II 1 1 II II II Mill I II I II I M II 

AAGTCCCAGATG ATGTGTGCGGAGGACACCC ACAGGGAGAAGTTCTGCTATGAGCTAACT 974 
GG AGAGCCCTTGGTCTGCTCC ATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1020 

IMIMIII 1 1 II IMIMI IMMIIMM Mill llllllll II III M III III I 

GGAGAGCCCTTGGTCTGCTCC ATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1034 
GGTGC AGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1080 

I MM Mill I III IMMMMMM Mill MIMM I M MMMII M III I II II 

GGTGC AGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1094 
CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1140 

ii iii 1 1 iii mi mi in in ii in i ii n iiiiiin in ii in n i n in ii 

CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1154 
CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1200 

III M 1 1 III M II I Ml III MM 1 1 1 II II II II I II 1 1 1 III II I II 1 1 1 M I M I 

CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1214 
TCCCTCACTTGTG 1213 

Minimi iii 

TCCCTCACTTGTG 1227 

_____ _.. BLAST P search using the protein of CuraGen Acc. No. CG51099-03. 

>ptnr:SPTREMBL-ACC:Q9Ul38 TESTES-SPECIFIC PROTEIN TSP50 - Homo sapiens (Human), 
385 aa. (SEQ ID NO: 78) 
Length = 385 

Score = 2090 (735.7 bits), Expect = 4.5e-216, P = 4.5e-216 
Identities = 385/385 (100%), Positives = 385/385 (100%) 

MGRWCQWARGQRPRTSAPSRAGALLLLLLLLRSAGCTGAGEAPGALSTADPADQSVQCV 6 0 

II IMM Mill 1 1 MM I Mill III II 1 1 1 II I II Ml M 1 1 1 M Ml II I II II I M 

MGRWCQTVARGQRPRTSAPSRAGALLLLLLLLRSAGCWGAGEAPGALSTADPADQSVQCV 60 
PKATCPSSRPRLLWQTPTTQTLPSTTMETQFPVSEGKVDPYRSCGFSYEQDPTLRDPEAV 120 

I I MM 1 1 M III 1 1 III II MM I M II M M 1 1 1 1 1 1 III III M 1 1 1 II I M II! II 

PKATCPSSRPRLLWQTPTTQTLPSTTMETQFPVSEGKTOPYT^ 120 
ARRWPWMVSVl^GTHICAGTIIASQWv_.OTAHCLI^^ 180 

M I II 1 1 III III Mill MM II 1 1 1 1 1 Mill III II III III II MM 1 1 II II I M 

ARRWPWMVSVl^GTHICAGTIIASQWvX.TVAHCLIWRDVIYSWVGSPWIDQOT 180 

VPVXQVI-fflSRYTttQRFWSWVGQANDIGLLiaKQELKY^ 240 

III II II II I II I III II III II II I II I II I II II I llllllll II II II I I II III II 

vpvxqvimhsryraqr™swvgqandigllklkqelkysnytopiclpgtdyvxkd^ 240 
tvtgwglskadgmwpqfrtiqekevi i lnnkecdnfyhnftki ptlvqi iksqmmcaedt 300 

mi iii in iii i mi ii mi ii in ii iii ii iii i n inn mm mm iii ii 

TVTGWGLSKADGMWPQFRTIQEKEVI I LNNKECDNFYHNFTKI PTLVQI I KSQMMCAEDT 300 
HREKFC_ELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYI^VSSYQHWIWDCLNGQ 360 

IMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIII 

HREKFCYELTGEPLVCSMEGTVWLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 360 

ALALPAPSRTLLLALPLPLSLLAAL 385 
lllllllllllllllllllllllll 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 
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Table 21. ClustalW alignment f CG51099-03 protein with related proteins. 



Information for the ClustalW proteins: 



Accno 

CG5 1099-03 (SEP ID NO:49) 
TEST HUMA N (SEP ID NO: 124) 

PSS8 HUMAN (SEP ID NP:125) 
Q9UI38 (SEPIDNP:78) 



Common Name Length 

novel Serine Protease-like protein 

TESTISIN PRECURSOR (EC 3.4.21.-) 314 
(EOSINOPHIL SERINE PROTEASE 1) (ESP- DE 1). 

PROSTASIN PRECURSOR (EC 3.4.21.-). 343 

TESTES-SPECIFIC PROTEIN TSP50. 385 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 



10 
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without altering protein structure or function (e.g. the group L, V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 



Table 22. PSORT, Signal? and hydropathy results for CuraGen Acc. No. CG51099 

5 03. 

outside Certainty=0. 5804 (Affirmative) < suco 

lysosome (lumen) Certainty=0. 5144 (Affirmative) < suco 

microbody (peroxisome) Certainty=0. 1203 (Affirmative) < succ> 

10 endoplasmic reticulum (membrane) Certainty=0. 1000 (Affirmative) < suco 



Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
15 max.C 40 0.888 0.37 YES 

max. Y 40 0.848 0.34 YES 
max.S 30 0.975 0.88 YES 
meanS 1-39 0.708 0.48 YES 

# Most likely cleavage site between pos. 39 and 40: CWG-AG 

20 



SECP 15 



A SECP15 nucleic acid and polypeptide according to the invention includes the nucleic 
25 acid sequence (SEQ ID NO:50) and encoded polypeptide sequence (SEQ ID NO:51) of clone 



CG57051-04 directed toward novel Angiopoie tin-like proteins and nucleic acids 
encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 

30 codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 

35 0.8200, and appears to be a signal protein (see Table 27 below). 



PPARG ANGIOPOIETIN-RELATED PROTEIN - PGAR: 
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Background 



The peroxisome proliferator-activated receptors (PPARs) are members of the nuclear 
hormone receptor subfamily of transcription factors. PPARs form heterodimers with retinoid X 
receptors (RXRs) and these heterodimers regulate transcription of various genes. There are 3 
5 known subtypes of PPARs, PPAR-alpha (170998), PPAR-delta (600409), and PPAR-gamma. 
PPAR-gamma is believed to be involved in adipocyte differentiation. Tontonoz et al. (1994) 
found 2 isoforms of PPAR-gamma in mouse, gamma- 1 and gamma-2, resulting from the use of 
different initiator methionines. 

Elbrecht et al. (1996) cloned cDNAs of PPAR-gamma-1 and PPAR-gamma-2 from 
10 human fat cell cDNA by PCR using primers based on the mouse sequence and on a previously 
published human cDNA sequence (Greene et al., 1995). They found that the human PPAR- 
gamma-1 and PPAR-gamma-2 genes have identical sequences except that PPAR-gamma-2 
contains an additional 84 nucleotides at its 5-prime end. The sequences obtained by Elbrecht et 
al. (1996) differed at 3 sites from the previously published human PPAR-gamma-1 sequence of 
15 Greene et al. (1995). By Northern blot analysis, Elbrecht et al. (1996) found that human PPAR- 
gamma is expressed at high levels in adipocytes and at a much lower level in bone marrow, 
spleen, testis, brain, skeletal muscle, and liver. 

The thiazolidinediones are synthetic compounds that can normalize elevated plasma 
glucose levels in obese, diabetic rodents and may be efficacious therapeutic agents for the 

20 treatment of noninsulin-dependent diabetes mellitus. Lehmann et al. (1995) identified the 
thiazolidinediones as high affinity ligands for mouse PPAR-gamma receptors. Elbrecht et al. 
(1996) confirmed that human PPAR-gamma-1 and PPAR-gamma-2 have similar activity and 
determined that 3 different thiazolidinedione compounds are agonists of PPAR-gamma-1 and 
PPAR-gamma-2. Elbrecht et al. (1996) speculated that the antidiabetic activity of the 

25 thiazolidinediones in humans is mediated through the activation of PPAR-gamma-1 and PPAR- 
gamma-2. 

The nuclear receptor PPARG/RXRA heterodimer regulates glucose and lipid homeostasis 
and is the target for the antidiabetic drugs GI262570 and the thiazolidinediones. Gampe et al. 
(2000) reported the crystal structures of the PPARG and RXRA ligand-binding domains 
30 complexed with the RXRA ligand 9-cis-retinoic acid, the PPARG agonist GI262570, and 

coactivator peptides. The structures provided a molecular understanding of the ability of RXRs 
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to heterodimerize with many nuclear receptors and of the permissive activation of the 
PPARG/RXRA heterodimer by 9-cis-retinoic acid. 

Mueller et al. (1998) showed that PPAR-gamma is expressed at significant levels in 
human primary and metastatic breast adenocarcinomas. Ligand activation of this receptor in 
5 cultured breast cancer cells caused extensive lipid accumulation, changes in breast epithelial 
gene expression associated with a more differentiated, less malignant state, and a reduction in 
growth rate and clonogenic capacity of the cells. Inhibition of MAP kinase, a powerful negative 
regulator of PPAR-gamma, improves the thiazolidinedione ligand sensitivity of nonresponsive 
cells. These data suggested that the PPAR-gamma transcriptional pathway can induce terminal 
10 differentiation of malignant breast epithelial cells. 

Tontonoz et al. (1994) identified a novel adipocyte-specific transcription factor, which 
they termed ARF6, and showed that it is a heterodimeric complex of RXRA and PPARG. (This 
ARF6 is not to be confused with ADP-ribosylation factor 6 (600464), with is also symbolized 
ARF6.) Tontonoz et al. (1995) demonstrated that PPAR-gamma-2 regulates adipocyte expression 
15 of the phosphoenolpyruvate carboxykinase gene (PCK1, 261680; PCK2, 261650). 

The formation of foam cells from macrophages in the arterial wall is characterized by 
dramatic changes in lipid metabolism, including increased expression of scavenger receptors and 
the uptake of oxidized low density lipoprotein (oxLDL). Tontonoz et al. (1998) demonstrated 
that the nuclear receptor PPAR-gamma is induced in human monocytes following exposure to 

20 oxLDL and is expressed at high levels in the foam cells of atherosclerotic lesions. Ligand 

activation of the PPAR-gamma:RXR-alpha heterodimer in myelomonocytic cell lines induced 
changes characteristic of monocytic differentiation and promoted uptake of oxLDL through 
transcriptional induction of the scavenger receptor CD36. These results revealed a novel 
signaling pathway controlling differentiation and lipid metabolism in monocytic cells. Tontonoz 

25 et al. (1998) suggested that endogenous PPAR-gamma ligands may be important regulators of 
gene expression during atherogenesis. 

Nagy et al. (1998) demonstrated that oxLDL activates PPAR-gamma-dependent 
transcription through a signaling pathway involving scavenger receptor-mediated particle uptake. 
Moreover, they identified 2 of the major oxidized linoleic acid metabolite components of 
30 oxLDL, 9-HODE and 13-HODE, as endogenous activators and ligands of PPAR-gamma. The 
authors found that the biologic effects of oxLDL are coordinated by 2 sets of receptors, one on 
the cell surface, which binds and internalizes the particle, and one in the nucleus, which is 
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transcriptionally activated by its component lipids. Nagy et al. (1998) suggested that PPAR- 
gamma may be a key regulator of foam cell gene expression. 

Chawla et al. (2001) provided evidence that in addition to lipid uptake, PPARG regulates 
a pathway of cholesterol efflux. PPARG induces ABCA1 (600046) expression and cholesterol 
5 removal from macrophages through a transcriptional cascade mediated by the nuclear receptor 
LXRA (NR1H3; 602423). Ligand activation of PPARG leads to primary induction of LXRA and 
to coupled induction of ABCA1 . Transplantation of PPAR null bone marrow into Ldlr -/- mice 
resulted in a significant increase in atherosclerosis, consistent with the hypothesis that regulation 
of LXRA and ABCA1 expression is protective in vivo. Chawla et al. (2001) proposed that 
10 PPARG coordinates a complex physiologic response to oxLDL that involves particle uptake, 
processing, and cholesterol removal through ABCA1. 

Fajas et al. (1997) used competitive RT-PCR to distinguish relative PPARG1 and 
PPARG2 mRNA levels in tissues. They determined that PPARG2 is much less abundant than 
PPARG 1. The highest levels of PPARG are found in adipose tissue and large intestine, with 

15 intermediate levels in kidney, liver, and small intestine, and barely detectable levels in muscle. 
Western blot analysis showed that PPARG is expressed as a 60-kD protein. EMSA analysis 
indicated that PPARG2 binds to and transactivates through a peroxisome proliferator response 
element. The PPARG gene contains 9 exons and spans more than 100 kb. Through alternative 
transcription start sites and alternate splicing, the mRNAs differ at their 5-prime ends, with 

20 PPARG1 being encoded by 8 and PPARG2 by 7 exons. PPARG1 uses exons Al and A2, 
whereas PPARG2 uses exon B; both use exons 1 through 6. 

Martin et al. (1998) reported that there are 3 PPARG isoforms which differ at their 5- 
prime ends, each under the control of its own promoter. PPARG 1 and PPARG3, however, give 
rise to the same protein, encoded by exons 1 through 6, because neither the Al nor the A2 exon 

25 are translated. By RNase protection analysis, Ricote et al. (1998) showed that in phorbol ester- 
stimulated macrophage cell lines, a probe to PPARG1 protected a 218-nucleotide fragment of 
PPARG1, but only a 174-nucleotide fragment of PPARG3. A PPARG2 probe protected a 
common 104-nucleotide fragment of both PPARG1 and PPARG3. PPARG2 itself was not 
expressed in the stimulated macrophages. PPARG 1 and PPARG2 promoters are primarily used 

30 in adipose tissue. The authors speculated that other inducing factors, such as cytokines MCSF 
(120420) or GMCSF (138960), or oxidized LDL (see OLR1, 602601), might differentially 
regulate expression of the 3 isoforms. 
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Lowell (1999) reviewed the role of PPARG in adipogenesis. 

Kersten et al. (2000) reviewed the roles of PPARs in health and disease. 



Tong et al. (2000) showed that murine GATA2 (137295) and GATA3 (131320) are 
specifically expressed in white adipocyte precursors and that their downregulation sets the stage 
5 for terminal differentiation. Constitutive GATA2 and GATA3 expression suppressed adipocyte 
differentiation and trapped cells at the preadipocyte stage. This effect was mediated, at least in 
part, through the direct suppression of PPARG. 

Mueller et al. (2000) showed that PPAR-gamma is expressed in human prostate 
adenocarcinomas and cell lines derived from these tumors. Activation of this receptor with 

10 specific ligands exerts an inhibitory effect on the growth of prostate cancer (176807) cell lines. 
They showed that prostate cancer and cell lines do not have intragenic mutations in the PPARG 
gene, although 40% of the informative tumors have hemizygous deletions of this gene. They 
conducted a phase II clinical study in patients with advanced prostate cancer using troglitazone 
(Rezulin), a PPAR-gamma ligand used for the treatment of type II diabetes. Oral treatment was 

15 administered to 41 men with histologically confirmed prostate cancer and no symptomatic 
metastatic disease. An unexpectedly high incidence of prolonged stabilization of prostate- 
specific antigen (KLK3; 176820) was seen in patients treated with troglitazone. In addition, 1 
patient had a dramatic decrease in serum prostate-specific antigen to nearly undetectable levels. 
The findings suggested that PPAR-gamma may serve as a biologic modifier in human prostate 

20 cancer and that its therapeutic potential should be further studied. 

By somatic cell hybridization and linkage analysis, Greene et al. (1995) mapped the 
human PPARG gene to 3p25. Beamer et al. (1997) mapped the gene to 3p25 by fluorescence in 
situ hybridization. 

Meirhaeghe et al. (1998) detected a polymorphism corresponding to a silent C-to-T 
25 substitution in exon 6 of the PPARG gene (601487.0009). 

Since PPARG is a transcription factor that has a key role in adipocyte differentiation, 
Ristow et al. (1998) investigated whether mutations of the gene encoding this factor predispose 
people to obesity. They studied 358 unrelated German subjects, including 121 obese subjects, 
looking for mutations in the PPARG2 gene at or near a site of serine phosphorylation at position 
30 1 14 that negatively regulates transcriptional activity of the protein. Four of the 121 obese 
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subjects had a missense mutation in the PPARG2 gene that resulted in conversion of proline to 
glutamine at position 115 (601487.0001), as compared with none of the 237 subjects of normal 
weight. All the subjects with the mutant allele were markedly obese. Overexpression of the 
mutant gene in murine fibroblasts led to the production of a protein in which the phosphorylation 
5 of serine at position 1 14 was defective, as well as accelerated differentiation of the cells into 
adipocytes and greater cellular accumulation of triglyceride than with the wildtype PPAR- 
gamma-2. These effects were similar to those of an in vitro mutation created directly at the 
serl 14 phosphorylation site. 

PPARG1 and PPARG2 have ligand-dependent and -independent activation domains. 

10 PPARG2 has an additional 28 amino acids at the amino terminus that render its ligand- 
independent activation domain 5- to 10-fold more effective than that of PPARG1. Insulin 
stimulates the ligand-independent activation of PPARG1 and PPARG2; however, obesity and 
nutritional factors influence only the expression of PPARG2 in human adipocytes. Deeb et al. 
(1998) reported that a relatively common prol2-to-ala substitution in PPARG2 (601487.0002) is 

15 associated with lower body mass index and improved insulin sensitivity among middle-aged and 
elderly Finns. A significant odds ratio (4.35, P = 0.028) for the association of the pro/pro 
genotype with type 2 diabetes was observed among Japanese Americans. The PPARG2 ala allele 
showed decreased binding affinity to the cognate promoter element and reduced ability to 
transactivate responsive promoters. These findings suggested that the PPARG2 prol2-to-ala 

20 polymorphism may contribute to the observed variability in BMI and insulin sensitivity in the 
general population. 

Valve et al. (1999) investigated the frequencies of the prol2-to-ala polymorphism in exon 
B and the silent CAC478-to-CAT polymorphism in exon 6 of the PPARG gene and their effects 
on body weight, body composition, and energy expenditure in obese Finnish patients. The 

25 frequencies of the alal2 allele in exon B and the CAT478 allele in exon 6 were not significantly 
different between the obese and population-based control subjects (0.14 vs 0.13 and 0.19 vs 0.21, 
respectively). The polymorphisms were associated with increased BMI, and the 5 women with 
both alal2ala and CAT478CAT genotypes were significantly more obese compared with the 
women having both prol2pro and CAC478CAC genotypes, and they had increased fat mass. The 

30 authors concluded that the prol2-to-ala and CAC478-to-CAT polymorphisms in the PPARG 
gene are associated with severe overweight and increased fat mass among obese women. 

82 



Sarraf et al. (1999) identified 4 somatic mutations (1 nonsense, 1 frameshift, and 2 
missense) in the PPARG gene among 55 sporadic colon cancers (114500). Each mutation greatly 
impaired the function of the PPARG protein. The 472delA mutation (601487.0003) resulted in 
the deletion of the entire ligand binding domain. Q286P (601487.0004) and K319X 
5 (601487.0005) retained a total or partial ligand binding domain but lost the ability to activate 
transcription through a failure to bind to ligands. R288H (601487.0006) showed a normal 
response to synthetic ligands but greatly decreased transcription and binding when exposed to 
natural ligands. These data indicated that colon cancer in humans is associated with loss-of- 
function mutations in the PPARG gene. 

10 Barroso et al. (1999) reported 2 different heterozygous mutations in the ligand-binding 

domain of PPARG in 3 subjects with severe insulin resistance (604367). In the PPAR-gamma 
crystal structure, the mutations destabilized helix 12, which mediates transactivation. Consistent 
with this, both receptor mutants were markedly transcriptionally impaired and, moreover, were 
able to inhibit the action of coexpressed wildtype PPAR-gamma in a dominant-negative manner. 

15 In addition to insulin resistance, all 3 subjects developed type 2 diabetes mellitus and 

hypertension at an unusually early age. Barroso et al. (1999) concluded that their findings 
represented the first germline loss-of-function mutations in PPAR-gamma and provided 
compelling genetic evidence that this receptor is important in the control of insulin sensitivity, 
glucose homeostasis, and blood pressure in man. 

20 Kroll et al. (2000) reported that t(2;3)(ql3;p25), a translocation identified in a subset of 

human thyroid follicular carcinomas, results in fusion of the DNA-binding domains of the 
thyroid transcription factor PAX8 (167415) to domains A to F of PPARG1. PAX8/PPARG1 
mRNA and protein were detected in 5 of 8 thyroid follicular carcinomas but not in 20 follicular 
adenomas, 10 papillary carcinomas, or 10 multinodular hyperplasias. PAX8/PPARG1 inhibited 

25 thiazolidinedione-induced transactivation by PPARG 1 in a dominant-negative manner. The 
experiments demonstrated an oncogenic role for PPARG and suggested that PAX8/PPARG1 
may be useful in the diagnosis and treatment of thyroid carcinoma. 

ANIMAL MODEL 

The nuclear hormone receptor PPARG promotes adipogenesis and macrophage 
30 differentiation and is a primary pharmacologic target in the treatment of type II diabetes. Barak 
et al. (1999) showed that PPARG gene knockout in mice resulted in 2 independent lethal phases. 
Initially, PPARG deficiency interfered with terminal differentiation of the trophoblast and 
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placental vascularization, leading to severe myocardial thinning and death by E10.0. 
Supplementing PPARG null embryos with wildtype placentas via aggregation with tetraploid 
embryos corrected the cardiac defect, implicating a previously unrecognized dependence of the 
developing heart on a functional placenta. A tetraploid-rescued mutant surviving to term 
5 exhibited another lethal combination of pathologies, including lipodystrophy and multiple 
hemorrhages. These findings both confirmed and expanded the current known spectrum of 
physiologic functions regulated by PPARG. 

Kubota et al. (1999) generated homozygous PPARG-deficient mouse embryos, which 
died at 10.5 to 11.5 days postcoitum due to placental dysfunction. Heterozygous PPARG- 

10 deficient mice were protected from the development of insulin resistance due to adipocyte 
hypertrophy under a high-fat diet. These phenotypes were abrogated by PPARG agonist 
treatment. Heterozygous PPARG-deficient mice showed overexpression and hypersecretion of 
leptin despite the smaller size of adipocytes and decreased fat mass, which may explain these 
phenotypes at least in part. This study revealed an unpredicted role for PPARG in high-fat diet- 

15 induced obesity due to adipocyte hypertrophy and insulin resistance, which requires both alleles 
of PPARG. 

Rosen et al. (1999) demonstrated that mice chimeric for wildtype and PPARG null cells 
showed little or no contribution of null cells to adipose tissue, whereas most other organs 
examined did not require PPARG for proper development. In vitro, the differentiation of 
20 embryonic stem cells into fat was shown to be dependent on PPARG gene dosage. These data 
provided direct evidence that PPARG is essential for the formation of fat. 

The thiazolidinedione (TZD) class of insulin-sensitizing, antidiabetic drugs interacts with 
PPAR-gamma. Miles et al. (2000) conducted metabolic studies in PPARG gene knockout mice. 
Because homozygous PPARG-null mice die in development, they studied glucose metabolism in 

25 mice heterozygous for the mutation. They identified no statistically significant differences in 
body weight, basal glucose, insulin, or free fatty acid levels between the wildtype and 
heterozygous groups. Nor was there a difference in glucose excursion between the groups of 
mice during oral glucose tolerance tests. However, insulin concentrations of the wildtype group 
were greater than those of the heterozygous deficient group, and insulin-induced increase in 

30 glucose disposal rate was significantly increased in the heterozygous mice. Likewise, the insulin- 
induced suppression of hepatic glucose production was significantly greater in the heterozygous 
mice than in wildtype mice. Taken together, these results indicated that-counterintuitively- 
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although pharmacologic activation of PPAR-gamma improves insulin sensitivity, a similar effect 
is obtained by genetically reducing the expression levels of the receptor. 

ALLE LIC VARIANTS (selected examples) 

.0001 OBESITY, SEVERE [PPARG, PROl 15GLN] 

5 In 4 German subjects with severe obesity (601665), Ristow et al. (1998) identified a 

prol 15-to-gln mutation of the PPAR-gamma-2 gene. Significantly, the mutation was in the 
codon immediately adjacent to a serine-1 14 phosphorylation site. The prol 15-to-gln mutation 
occurs in exon 6, which is shared by all 3 forms of PPAR-gamma Wang et al. (1999). 

.0002 PPARG2 POLYMORPHISM C/G [PPARG, PRO 12 ALA ] 

10 OBESITY, PROTECTION AGAINST DIABETES MELLITUS, TYPE H, 

SUSCEPTIBILITY TO, INCLUDED Because the product of the PPARG gene is a nuclear 
receptor that regulates adipocyte differentiation and possibly lipid metabolism and insulin 
sensitivity, Yen et al. (1997) screened for mutations in the entire coding region of the PPARG 
gene in 26 diabetic Caucasians with or without obesity (601665). They found a CCG (pro)-to- 

15 GCG (ala) missense mutation at codon 12 (P12A). The allele frequency of the mutation varied 
from 0.12 in Caucasian Americans to 0.10 in Chinese. Beamer et al. (1998) noted that the amino 
acid position of the P12A mutation is within the domain of PPAR-gamma-2 that enhances 
ligand-independent activation, that the substitution of alanine for proline is nonconservative, and 
that this amino acid change might cause a significant alteration in protein structure. To test the 

20 hypothesis that individuals with the variant are at increased genetic risk for obesity and/or insulin 
resistance, they performed association studies in 2 independently recruited cohorts of unrelated, 
nondiabetic, adult Caucasian subjects. They found that the P12A mutation was associated with 
higher BMI in the 2 cohorts, suggesting that the mutation may contribute to genetic susceptibility 
for the multifactorial disorder of obesity. 

25 Deeb et al. (1998) studied a polymorphism of the PPARG gene, a C-to-G variant that 

created an Hgal restriction site and predicted the substitution of alanine for proline at position 12 
in the PPARG2-specific exon B. In a group of Finnish men and women with a PPARG2 ala 
allele frequency of 0.12, they found that this allele was associated with lower fasting insulin 
levels (P = 0.01 1) and BMI (P = 0.027) and higher insulin sensitivity (P = 0.047). This 

30 association was independent of sex. The findings were verified by studies in a group of elderly 
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subjects. They also studied the association of the prol2-to-ala substitution in PPARG2 with type 
2 diabetes (125853) in a group of second-generation Japanese-American (Nisei) men and women 
that included individuals with type 2 diabetes, impaired glucose tolerance, and normal controls. 
The ala allele was less frequent among subjects with type 2 diabetes (0.022) than among normal 
5 controls (0.092). The odds ratio for association of pro/pro with diabetes was significant (4.35, P 
= 0.028), whereas the frequency of the ala allele among impaired glucose tolerance subjects was 
intermediate (0.039). Deeb et al. (1998) suggested that the lower transactivation capacity of the 
ala variant of PPARG2 underlies the association of this allele with lower BMI and higher insulin 
sensitivity. The ala isoform may lead to less efficient stimulation of PPARG target genes and 
10 predispose to lower levels of adipose tissue mass accumulation, which in turn may be 
responsible for improved insulin sensitivity. 

r 

Altshuler et al. (2000) evaluated 16 published genetic associations to type 2 diabetes and 
related subphenotypes using a family-based design to control for population stratification, and 
replication samples to increase power. They confirmed only 1 association, that of the common 
15 prol2-to-ala polymorphism in PPAR-gamma with type 2 diabetes. By analyzing over 3,000 

individuals, they found a modest (1.25-fold) but significant (P = 0.002) increase in diabetes risk 
associated with the more common proline allele (approximately 85% frequency). Because the 
risk allele occurs at such high frequency, its modest effect translates into a large population- 
attributable risk-influencing as much as 25% of type 2 diabetes in the general population. 

20 .0003 CANCER OF COLON [PPARG, 1-BP DEL, 472A] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
mutation in the PPARG gene, a 1-bp deletion at nucleotide 472, which resulted in a frameshift. 

.0004 CANCER OF COLON [PPARG, GLN286PRO] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
25 mutation in the PPARG gene, an A-to-G transition at nucleotide 857, which resulted in a gln286- 
to-pro substitution. 

.0005 CANCER OF COLON [PPARG, LYS319TER] 

In a sporadic colon cancer (114500), Sarraf et al. (1999) identified a somatic mutation in 
the PPARG gene, an A-to-T transversion at nucleotide 955, which resulted in a lys319-to-ter 
30 substitution. 



.0006 CANCER OF COLON [PPARG, ARG288HIS] 



In a sporadic colon cancer (114500) tumor, Sarraf et al. (1999) identified a somatic 
mutation in the PPARG gene, a G-to-A transition at nucleotide 863, which resulted in an arg288- 
to-his substitution. 

5 .0007 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 

NIGRICANS AND HYPERTENSION [PPARG, PR0467LEU ] 

In a patient with severe insulin resistance, type 2 diabetes mellitus, and hypertension 
(604367) who had been diagnosed in her twenties, Barroso et al. (1999) detected a C-to-T 
transition in the PPARG gene resulting in a proline-to-leucine mutation at codon 467 (P467L). 
10 Her son, aged 30 years, who also had a history of early-onset diabetes and hypertension, was also 
heterozygous for the P467L mutation. All other family members, including both parents of the 
proband, none of whom were known to have diabetes or hypertension, were homozygous for 
wildtype receptor sequence. Nonpaternity was excluded, indicating a de novo appearance of the 
mutation in the proband. 

15 .0008 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 

NIGRICANS AND HYPERTENSION [PPARG, VAL290MET ] 

In a 15-year-old patient with primary amenorrhea, hirsutism, acanthosis nigricans, 
elevated blood pressure, and markedly elevated fasting and postprandial insulin levels (604367), 
Barroso et al. (1999) identified a G-to-A transition in the PPARG gene resulting in a valine-to- 
20 methionine mutation at codon 290 (V290M). By age 17 the patient had developed type 2 
diabetes and had hypertension which required treatment with beta-blockers. Her clinically 
unaffected mother and sister were both wildtype at this locus; screening of the deceased father 
was not possible. 

.0009 PPARG POLYMORPHISM C-T [PPARG, 161C-T ] 

25 Meirhaeghe et al. (1998) reported a 161C-T substitution in exon 6 of the PPARG gene. 

Since PPAR-gamma is a transcription factor implicated in adipocyte differentiation and in lipid 
and glucose metabolism, they analyzed the relationships between this genetic polymorphism and 
various markers of the obesity phenotype in a representative sample of 820 men and women 
living in northern France. The frequencies of the C and T alleles were 0.860 and 0.140, 

30 respectively. In the whole sample, no association of the polymorphism with the markers tested 
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was observed, but a statistically significant interaction (P less than 0.03) existed between this 
polymorphism and body mass index (BMI) for plasma leptin levels. Obese subjects bearing at 
least one T allele had higher plasma leptin levels than subjects who did not. This effect existed in 
both genders, despite the higher plasma leptin levels observed in women. Thus, for a given leptin 
5 level, the BMI was relatively lower in obese subjects carrying at least one T allele than in obese 
CC homozygotes. 

Wang et al. (1999) studied this polymorphism in 647 Australian Caucasian patients aged 
65 years or less, with or without angiographically documented coronary artery disease. The 
frequencies of the CC, CT, and TT genotypes were 69.8%, 27.7%, and 2.5%, respectively, and 

10 the T allele frequency 0.163. These frequencies were in Hardy-Weinberg equilibrium and not 
different between men and women. Wang et al. (1999) found that the T allele carriers (CT and 
TT genotypes) had significantly reduced coronary artery disease risk compared to the CC 
homozygotes, with an odds ratio of 0.457. Association with obesity (601665) was not found in 
these patients. The authors interpreted this to indicate that the PPARG gene may have a 

15 significant role in atherogenesis, independent of obesity and of lipid abnormalities, possibly via a 
direct local vascular wall effect. 

Using a subtractive cloning strategy to identify downstream targets of peroxisome 
proliferator-activated receptor-gamma (PPARG; 601487), and by screening cDNA libraries, 
Yoon et al. (2000) isolated mouse and human cDNAs encoding PGAR. The 406-amino acid, 60- 

20 kD human PGAR protein, which shares 75% amino acid identity with the mouse protein, is a 
member of the angiopoietin family of secreted proteins and bears highest similarity to 
angiopoietin-2 (ANGPT2; 601922). Like other members of this family, PGAR contains a 
predicted coiled-coil quaternary structure, and the authors hypothesized that PGAR may form 
multimeric or other higher-order structures. PGAR has a secretory signal peptide, 3 potential N- 

25 glycosylation sites, and 4 cysteines that may be available for intramolecular disulfide bonding. 
Northern blot analysis detected a 2-kb PGAR transcript that was highly enriched in white fat and 
placenta. In situ hybridization analysis revealed expression of mouse Pgar at low levels in most 
organs and connective tissue at embryonic day 13.5 (E13.5). Between E15.5 and E18.5, strongest 
expression of Pgar was in brown fat. Northern blot analysis detected elevated levels of Pgar 

30 expression in mouse models of obesity and diabetes. Alterations in nutrition and leptin (164160) 
administration in mice modulated Pgar expression in vivo. Yoon et al. (2000) demonstrated that 
PPARG ligand-induced transcription of PGAR follows a rapid time course typical of immediate- 
early genes and occurs in the absence of protein synthesis. Using a culture model system, they 
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observed that induction of the PGAR transcript coincides with hormone-dependent adipocyte 
differentiation. Yoon et al. (2000) concluded that PGAR is a bona fide target of PPARG and may 
have a role in regulation of systemic lipid metabolism or glucose homeostasis. 

Kersten et al. (2000) identified mouse Pgar, which they called Fiaf (fasting-induced 
5 adipose factor), using a subtractive hybridization assay to identify PPARA (170998) target 
genes. Northern blot analysis detected expression of Fiaf in mouse white and brown adipose 
tissue, with weak expression in lung, kidney, and liver. Using a combination of wildtype, Ppara 
mutant, and Pparg mutant mice, Kersten et al. (2000) demonstrated that mRNA expression is 
stimulated by PPARA in liver and by PPARG in white adipose tissue. Expression of Fiaf was 
10 upregulated in liver and white adipose tissue during fasting. Western blot analysis showed that 
the abundance of Fiaf in plasma decreased with high fat feeding, an effect directly opposite that 
observed with leptin. 

By radiation hybrid analysis, Yoon et al. (2000) mapped the PGAR gene to 19pl3.3. 

The DNA and protein sequences for the novel Angiopoietin-like gene are reported here 
15 as CuraGen Acc. No. CG5705 1-04. 

<■ 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 716 of 733 bases (97%) identical to a gb:GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
20 protein PP1 158 mRNA, complete cds) (Table 23). The full amino acid sequence of the protein of 
the invention was found to have 181 of 183 amino acid residues (98%) identical to, and 182 of 
183 amino acid residues (99%) similar to, the 406 amino acid residue ptnnSPTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 
PP1 158) (Table 24). 

25 A multiple sequence alignment is given in Table 26, with the protein of the invention 

being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 184L to 347G and SNPs: Q24R and G25S. 
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The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 
fibrinogenC 1/1 184 236.. 204 272.] 31.7 4.1e-08 
IPR002181; Fibrinogen^ 

Fibrinogen [1], the principal protein of vertebrate blood clotting is an hexamer containing 
10 two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
cysteines that participate in the cross-linking of the chains. However, there is no similarity 
between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
15 As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. (SEP ID NO: 126) 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ + + + 

20 

'C: conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below. 

Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, of 
about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 
25 In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the 

regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. In the C-terminus of a mammalian T-cell specific protein of unknown function. 

In the C-terminus of a human protein of unknown function which is encoded on the 
opposite strand of the steroid 21 -hydroxy lase/complement component C4 gene locus. 

30 The function of this domain is not yet known, but it has been suggested that it could be 

involved in protein-protein interactions. 

90 



This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 



Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
5 This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
10 following tissues: Adipose, Heart, Aorta, Coronary Artery, Umbilical Vein, Adrenal 

Gland/Suprarenal gland, Pancreas, Islets of Langerhans, Thyroid, Pineal Gland, Parotid Salivary 
glands, Liver, Small Intestine, Duodenum, Colon, Bone Marrow, Lymph node, Bone, Cartilage, 
Synovium/Synovial membrane, Skeletal Muscle, Brain, Thalamus, Pituitary Gland, Amygdala, 
Hippocampus, Spinal Chord, Mammary gland/Breast, Ovary, Placenta, Uterus, Vulva, Prostate, 
15 Testis, Lung, Kidney, Retina, Skin, Foreskin. Expression information was derived from the 

tissue sources of the sequences that were included in the derivation of the sequence of CuraGen 
Acc.No.CG57051-04. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
20 in Table 27. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
the cytoplasm, the protein of CuraGen Acc. No. CG57051-04 predicted here is similar to the 
Fibrinogen family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 

Functional Variants and Homologs 

25 The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 

nucleic acid whose sequence is provided in Figure 20, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Fig. 1 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
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The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-04, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
5 chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
10 complements, up to about 3% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 20. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 20 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
15 functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 

Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
protein, in which the Angiopoietin-like protein of the present invention is joined to a second 

20 polypeptide or protein that is not substantially homologous to the present novel protein. The 

second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-04 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 

25 or protein is joined at the carboxyl terminus, of the CG57051-04 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 

30 motif such as (HisV 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 
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Antibodies 



The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab>2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 
Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 
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Table 23. BLASTN search using CuraGen Acc. N . CG57051-04. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. 
Length= 1943 (seq id no:79) 

Plus Strand HSPs: 

Score = 3468 (520.3 bits), Expect = 7.8e-202, Sum P(2) = 7.8e-202 
Identities = 716/733 (97%), Positives = 7167733 (97%), Strand = Plus / Plus 

GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 6 1 

II III 1 1 II I Ml I II II M II I III I III III I! 1 1 II III I II I MINI I II III II 

GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 7 9 
TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 121 

IMM 1 1 III I II 111 II 1 1 III III III! Mill IMMM I II MM I Mill I II II 

TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 139 
CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

II I MM III I II III Ml I II MM MM I II M III I III III I IMMM Mill 1 1 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 199 
GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCT-AGATCTGGACCCGTGCA 240 

r 1 1 i 1 1 1 f 1 1 1 1 1 f i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n i iiiiiiinii 

GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC-GGACCCGTGCA 258 
GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 

Mill MM MM IMM M III IIMIIIIMII MM III III MM MM Mill II 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 318 
GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 

Mill 1 1 II 1 1 M I M Ml I II MM MM M M I M II I IMM I MM 1 1 1 II Ml II 

GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 378 
GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 

II III Ml Ml II II IMMM IMIIIIII IIIMIMM MUM III Ml I MM II 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 43 8 
CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 

Mill MMMM Mill II III Ml IMIIIMIII II III III IMIIMMMM II 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 498 
CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACTUVGGTGGCCCAGCAGCAGCGGCA 540 

MMM I II M Ml II Ml MM 1 1 1 MM II II MM MM I M I Ml Ml I II I M M 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 558 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGC AAAGCCAGTTTGGCCTCCTGGACCA 600 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 

CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 

MMM I III III IMM I MMMI MIMIM Ml II I M I II MM MM III II 1 1 

CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 678 
CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGAG-GCTGGTGGTTTGGCA 719 

IMMM Ml Ml II I Ml I III III Ml I III MIMMM II II III II 

CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGGCTGCCCAGGGATTGCCA 738 

CCTGCAGCCATTCCA 734 

I III Mill 

G — G-AGCTGTTCCA 750 

Score = 1 182 (177.3 bits), Expect = 7.8e-202, Sum P(2) = 7.8e-202 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Plus / Plus 

Query: 693 GCCTGCACCG-AGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 751 

Mil I I I II III III II I Mil III III II III II I II I II INI I II I III II I I 
Sbjct: 1203 GCCT-CTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 1261 

Query: 752 TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 811 

94 



Query: 


2 


Sbjct: 


20 


Query: 


62 


Sbjct: 


80 


Query: 


122 


Sbjct: 


140 


Query: 


182 


Sbjct: 


200 


Query: 


241 


Sbjct: 


259 


Query: 


301 


Sbjct: 


319 


Query: 


361 


Sbjct: 


379 


Query: 


421 


Sbjct: 


439 


Query: 


481 


Sbjct: 


499 


Query: 


541 


Sbjct: 


559 


Query: 


601 


Sbjct: 


619 


Query: 


661 


Sbjct: 


679 


Query: 


720 


Sbjct: 


739 



10 



15 



Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



II Mill 1 1 1 1 1 1 1 1 i II I II I ! I II I II I IIIIIIMIIIMIII llll Mill II III 

1262 TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 1321 



812 



871 



CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 
II II I I I I II II I I I I II II I I I I II III II Ml I II I I I II II I I I II I III I I II I II 
1322 CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 1381 



872 



931 



GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 

I II I II II I MMI I II 1 1 1! 1 1 II III II II 1 1 II I 1 1 1 1 II 1 1 I 1 1 1 II 1 1 1 II III 

1382 GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 1441 
932 GCTCTG 937 

hum 

1442 GCTCTG 1447 



20 



25 



30 



35 



45 



50 



55 



60 



Table 24. BLASTP search using the protein of CuraGen Acc No. CG57051-04. 

>ptnr:SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Human), 406 aa. (seq id no:80) 
Length = 406 

Score = 929 (327.0 bits), Expect = 4.4e-126, Sum P(2) = 4.4e-126 
Identities = 181/183 (98%), Positives = 182/183 (99%) 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



1 MSGAPTAGAALMLCAATAVLLSARSG PVQSKS PRFASWDEMNVLAHGLLQLGQGLREHAE 

II lllll I II I IMII II I II 11+ llll MINI II IN M in i iiii i mi m 1 1 

1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 



61 



60 



60 



120 



RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

li II III II I II Ml III I! IIMIMIMI Mill I II III I! II I M I II: II I! Ill 

61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 



121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 

II II II I II I Mill Ml MMI I Ml MMI MM M M M I M M I Ml M M I M I M 

121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 
181 LHR 183 



40 Sbjct: 181 LHR 183 



Score = 333 (117.2 bits), Expect = 4.4e-126, Sum P(2) = 4.4e-126 
Identities = 60/62 (96%), Positives = 60/62 (96%) 



Query: 
Sbjct: 
Query: 
Sbjct: 



181 



180 
180 



240 



LHRGWWGTCSHSNI^GQYFRSIPQQRQKLKKGIFmTWRGRYYPLQATTMLIQPMAAEA 

I I III II MINI Mill III Mill III llllll Mill II III III II III II III _ 

345 LSGGWWFGTCSHSNLNGQYFRS I PQQRQKLKKGI FWKTWRGRYYPLQATTMLIQPMAAEA 404 

241 AS 242 
II 

405 AS 406 



Score = 49 (17.2 bits), Expect = 2.4e-33, Sum P(2) = 2.4e-33 
Identities = 14/40 (35%), Positives = 20/40 (50%) 

Query: 1 MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDE 40 

+ I II +1 I I I I I ♦ I I I++II+ 

Sbjct: 293 LGGEDTA-YSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQ 331 



Table 25. BLASTN identity search of CuraGen Corporation's Human SeqCalling database using 
CuraGen Acc. No. CG57051-04. 



>s3aq:230527544 , 2394 bp. (seq id no:8D 
65 Length = 2394 

Minus Strand HSPs: 



95 



Score = 3468 (520.3 bits), Expect = 1.2e-202, Sum P(2) = 1.2e-202 
Identities = 716/733 (97%), Positives = 7167733 (97%), Strand = Minus / Plus 



5 


Query: 


734 




Sbjct: 


1645 


1 A 
11) 


Query: 


675 




Sbjct: 


1702 




Query: 


615 


15 


Sbjct: 


1762 




Query: 


555 


20 


Sbjct: 


1822 




Query: 


495 




Sbjct: 


1882 


25 


Query: 


435 




Sbjct: 


1942 


30 


Query: 


375 




Sbjct: 


2002 




Query: 


315 


35 


Sbjct: 


2062 




Query: 


255 


40 


Sbjct: 


2122 




Query: 


196 




Sbjct: 


2181 


45 


Query: 


136 




Sbjct: 


2241 


50 


Query: 


76 




Sbjct: 


2301 




Query: 


16 


55 


Sbjct: 


2361 



60 



65 



70 



75 



TGGAATGGCTGCAGGTGCCAAACCACCAGCCTC-GGTGCAGGCGGCTGACATTGTGAGCC 676 

inn 111 i ii 111 ii it i nun iiiiiiiiiiii inn 111 _ ni 

TGGAACAGCTCCTGG CAATCCCTGGGCAGCCGGTGCAGGCGGCTGACATTGTGAGCC 1701 

GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 616 

lllllllll! MIIMII II! IIIMMMII M MMMIII MINI 1 1 III III Ml 

GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 1761 



TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 

iiiiiiiiiiiiiiiiii iiiiiiiiiiii mi i ii iiiiiini mil iiiiiiiii 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 
TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIMMMI IIIIIIIIIIIIIIIIIIIIMI 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 



lllllllll lllllllllllllllllll 



IIIIIIIIIIII 



mi 



TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 



556 
1821 
496 
1881 



376 



2001 
316 



CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 

1 1 1 1 II 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 Ml 1 1 IN onci 

CTCAGGCGCCGCTCCAGCGCGCTCAGCTG ACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 2061 
CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 256 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II M M 1 1 JL MM I Ml JLJLJLII 1 1 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 2121 
CGCGGCGACTTGGACTGCACGGGTCCAGATCT- AGCGCTCAGTAGCACGGCGGTGGCGGC 197 

II II 1 1 1 II M I! !l 1 1! 1 1 1 1 1 1 1 1 I II I I II IN II I III III II llllll I II 

CGCGGCGACTTGGACTGCACGGGTCC-GCCCTGAGCGCTCAGTAGCACGGCGGTGGCGGC 2180 
GCAGAGCATC AGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 137 

iniiiiini ii him iinii iiiiii mi i mill iiiiiiiii n in i nn „ An 

GC AGAGC ATC AGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 2240 
AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 7 7 

1 1 " 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 II II I II 1 1 1 1 1 1 II 1 1 II 1 11 III I II II II „ nn 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 2300 
GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 17 

1 1 1 M 1 1 1 1 1 1 1 H 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MM Ml Ml 1 1 1 1 „ cn 

GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 2360 



I I I I Mill I Ml II 

CGTGTGAGGATCCGC 2375 

Score = 1 182 (177.3 bits), Expect = 1.2e-202, Sum P(2) = 1.2e-202 (sbq id no»127) 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



937 
948 
877 
1008 



CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 

I III I II II 1 1 II II I Ii II II 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II Ml I II II MM II I II II II 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 



878 
1007 



818 



GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 

i iiiiiiiiiiii inn mil i iiiiii mi iiiiii i iiiiiiiiiiiiiiiiiii iam 

GGCTGCCTCTGCTGCCATGGGCTGGATCAAC ATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1067 

758 



817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 

IIIIMIIMI IMMII MM III III lllllllll II 111 IIIIIMI III lllllll 

1068 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1127 
757 GCGGAAGTACTGGCGGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 699 

1 1 1 1 M 1 1 M M Ml IMI II 1 1 1 1 Ml 1 1 MM 1 1 1 1 1 IMM 1 1 1 1 1 I M Ml | 

1128 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 1187 



698 



GCAGGC 
I Mil 



693 



96 



Sbjct: 1188 G-AGGC 1192 



>s3aq:2 18296061 , 1862 bp. (seq id no : 82) 
Length =1862 

Minus Strand HSPs: 



Score = 3444 (516.7 bits), Expect = 1.8e-201, Sum P(2) = 1.8e-201 
10 Identities = 714/733 (97%), Positives = 714/733 (97%), Strand = Minus / Plus 





Query: 


734 


15 


Sbjct: 


1133 




Query: 


675 




Sbjct: 


1190 




Query: 


615 




Sbjct: 


1250 


25 


Query: 


555 




Sbjct: 


1310 




Query: 


495 


30 


Sbjct: 


1370 




Query: 


435 


35 


Sbjct: 


1430 




Query: 


375 




Sbjct: 


1490 


40 


Query: 


315 




Sbjct: 


1549 


45 


Query: 


255 




Sbjct: 


1609 




Query: 


196 


50 


Sbjct: 


1668 




Query: 


136 


55 


Sbjct: 


1728 




Query: 


76 




Sbjct: 


1788 


60 


Query: 


16 




Sbjct: 


1848 



TGGAATGGCTGCAGGTGCCAAACCACCAGCCTC-GGTGCAGGCGGCTGACATTGTGAGCC 67 6 

Mill III I II III II II I II llllll lllllllllllll Mill 

TGGAACAGCTCCTGG CAATCCCTGGGCAGCCGGTGCAGGCGGCTGACATTGTGAGCC 1189 



GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 

IIIIIIIIIIIIIIIIIIIIIMIIIIII IIIMIIIIIIIIIIIIIIIIIIIIIIIMI 

GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 



616 



1249 



556 



TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 

IIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIMIIII IMIIIIIIIIIMIIII 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGC AGATGCTGAATTCGCAGG 1309 
TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 496 

iiiiiiiiiiiii i mi ii ill mi iiiiii iiiiii i ii mi iiiiiiiiiiii ii „ en 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 1369 
CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTCAGGGTCCACCCGGCTC 43 6 

llllllllllllllll lllllll llllllllllllllllll IIIIII IIIMM II II 

CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCCCAGGGTCCACCCGGCTC 1429 



llllllllllllllll 



Mill II M Mill II 



III 



CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 

lllllllll I II I MM IN MIIIMI II III I II 1 1 1 II 1 1 MM NIIMIIIMI 

CTCAGGCGC-GCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 



316 



1548 
256 



CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 

1 1 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U I Ml 1 1 MM I icnQ 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 1608 



M II 1 1 1 II 1 1 1 1 1 1 1 II II 1 1 1 Ml I II 1 1 M I II I II Ii II I MM II I II I 



197 



1667 



137 



GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 

INN II II I III III II II III III III lllllll IN IN MM lllllllllllll I 

GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 1727 
AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 7 7 

iiiiiii iiiiiiiiiii iiiiiiiiiiiii iii ii 1 1 mi iiiiiiiiiiiiiiii ii 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 1787 



GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 

Ml MM IIIIII Ml II II Ml III III IIIMM II MM MM IMMM II I MM 

GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 



lllllllllllllll 



17 

1847 



Score = 1 182 (177.3 bits), Expect = 1.8e-201, Sum P(2) = 1.8e-201 (seq id no«138) 
65 Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 



70 



75 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



937 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

II III IIIIII I IIIIII III lllllllllllll III Mil lllllll IIIIIIIIIIII _ 

436 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 495 



877 



818 



GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 

I II 1 1 1 1 II 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

496 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 555 
817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

97 



1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 _ 

Sbjct: 556 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 615 
Query: 7 57 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 699 

5 | llllll II II II II II I I III II II I II II I III II M I M in i ii ii M ii ii I MB 

Sbjct: 616 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 67 5 

Query: 698 GCAGGC 693 
I "II 

10 Sbjct: 676 G-AGGC 680 



15 



20 



65 



>s3aq:2 17940431 Category E: , 530 bp. (Seq id no:83) 
Length = 530 

Minus Strand HSPs: 



Score = 1800 (270.1 bits), Expect = L2e-75, P = 1.2e-75 
Identities = 384/403 (95%), Positives = 384/403 (95%), Strand = Minus / Plus 

Query: 631 AGGCTTGGCCACC-TCATGGTCTAGGTG-CTT-GTGGTCCAG-GAGGCCAAACTGGCTTT 576 

. II I III I II III Ml I III II II III 1 1 1 1 1 1 1 IIMIIII ior 

Sbjct: 128 AGCCCTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 185 

25 Query: 575 GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 516 

1 I I I 1 I I I I I I 1 I I I I I I I I 1 I 1 I t I I I I I 1 1 I t I I I I I I I I I I I I I I I I I t I I I I I ■ JLI OJR 

Sbjct: 186 GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 245 

Query: 515 GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 456 

30 IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHII , nc 

Sbjct: 246 GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 305 
Query: 455 CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 396 

I 1 1 1 II !l 1 1 1 1 1 1 1 1 1 II I M M I 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1! 1 1 1 M M 

35 Sbjct: 306 CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 365 
Query: 395 GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 336 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml Ml Ml M 

Sbjct: 366 GACAGGCGGACCCGCACGCGCTCAGGCGCCGTTTCAGCGCGCTCAGCTGACTGCGGGTGC 425 

40 

Query: 335 GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 276 

MM III II II I II II II llllll I MM Mill III II III Ml II II I Mill II I II oc 

Sb j C t : 426 GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCC AGGACATTCA 485 
45 Query: 275 TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 231 

1 1 1 f 1 1 1 II 1 1 II I II II II 1 1 II I M 1 1 II 1 1 1 II I II I II 1 1 1 Mft 

Sbjct: 486 TCTCGTCC C AGG ACGC AAAGC GCGGCG AC TTGG AC TG C ACGG GT C 530 

50 >s3aq:230121563 , 788 bp. (SEQ ID NO: 84) 
Length = 788 

Minus Strand HSPs: 

55 Score = 1 182 (177.3 bits), Expect = 6.4e-48, P = 6.4e-48 

Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 

Query: 937 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

Mill Mill MM II Mill IIMIIII Mill Mill III II MM II III II II III _ 

60 Sbjct: 171 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 230 
Ouery- 877 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

I Mill IIMIIII Mill Mill I II II III Mill Mill III 1 1 II 1 1 II 1 1 Mill _ 

Sbjct: 231 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 290 
Query: 817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

lllllllllllllllllllllllllllllll I I I II „„ 

Sbjct: 291 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 350 

70 Query: 757 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 699 

|| II I II I I I II I I I I II II I I II II I I II M I I I I I I I I II I I II I I I I I , in 

Sbjct: 351 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 410 



Query: 698 GCAGGC 693 

75 Mill 



98 



Sbjct: 411 G-AGGC 415 

>s3aq:217939973 , 631 bp. (SEQ ID NO: 85) 
5 Length = 631 

Minus Strand HSPs: 

Score = 1 182 (177.3 bits), Expect = 8.0e-48, P = 8.0e-48 
10 Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 

Ouerv: 937 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 
S 6j «: 105 1S^^ 1« 

Sbjct: 165 CKJCTGCC^ 224 
20 Ouerv: 817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 ooyi 

Sbjct: 225 (XTCCCGC^ 284 
Ouerv 757 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACC ACCAGCCTCG - GT 699 

25 MINI II II MINI I MM Mill III I II II III II III II I II M I III IN I „ A 

Sbjct: 285 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 344 
Query: 698 GCAGGC 693 
30 Sbjct: 345 G-AGGC 349 



15 



35 



40 



>s3aq:217939964 , 328 bp. (SEQ ID NO:86) 

Length = 328 
Plus Strand HSPs: 



Score = 777 (1 16.6 bits), Expect = 3.0e-29, P = 3.0e-29 
Identities = 157/159 (98%), Positives = 157/159 (98%), Strand = Plus / Plus 

Ouerv: 779 AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 838 

MM II II II III I MM III I MM I II II II I III II I II II I II II I HIM Ml II A 

Sbjct: 1 AAGCTTAAgU<^ 60 

45 Ouerv: 839 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 898 

* W | | II I I I II I II I I I I II I I I II II II I I II I I I I I I II II I I I I II II I II I II ion 

Sbjct: 61 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 120 

Query: 899 TGGTCCCAGGCCCACGAAAGACGGTGACTCTTGGCTCTG 937 

50 I Mill MINI 1 1 III I III Mill II II III I Ml 1co 

Sbjct: 121 TGGTCCCAGGCCAACGAAAGACGGTGACTCTTGGCTCCG 159 



Table 26. ClustalW alignment of CG57051-04 protein with related proteins. 



55 



Information for the ClustalW proteins: 



Accn 

CG5705 1 -04 (SEP n> NO:5i> 

CG5705 1 -02 <SEOIDNO:55) 



Common Name Length 

novel Angiopoietin-like protein 242 
Angiopoietin Related protein / PPAR-gamma 386 



99 



Q9HB V 4 (SEP ID NO:80) 
CG5705 1 -03 (SEOIDNO:57) 



ANGIOPOIETIN-LIKE PROTEIN PP1 158. 
Angiopoietin-like protein- isoform 3 



406 
368 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
5 without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 27. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051- 

04. 

10 endoplasmic reticulum (membrane) — Certainty=0.8200( Affirmative) < suco 
plasma membrane — Certainty=0. 1900( Affirmative) < suco 
microbody (peroxisome) — Certainty^. 1701 (Affirmative) < suco 
endoplasmic reticulum (lumen) — Certainty=0.1000( Affirmative) < suco 

1 5 INTEGRAL Likelihood = -4.04 Transmembrane 7 - 23 ( 4 - 25) 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
20 max.C 31 0.427 0.37 YES 

max.Y 31 0.473 0.34 YES 
max.S 8 0.952 0.88 YES 
meanS 1-30 0.738 0.48 YES 

# Most likely cleavage site between pos. 30 and 31: VQS-KS 

25 

SECP 16 

A SECP16 nucleic acid and polypeptide according to the invention were obtained by 
exon linking and include the nucleic acid sequence (SEQ ID NO:52) and encoded polypeptide 

30 sequence (SEQ ID NO:53) of clone CG5705 1-05 directed toward novel Angiopoietin-like 
proteins and nucleic acids encoding them. Figure 21 illustrates the nucleic acid sequence and 
amino acid sequences respectively. This clone includes a nucleotide sequence (SEQ ID NO:52) 
of 1239 bp. The nucleotide sequence includes an open reading frame (ORF) beginning with an 
ATG initiation codon at nucleotides 80-82 and ending with a TAG stop codon at nucleotides 

35 1 184-1 186. Putative untranslated regions, if any, are found upstream from the initiation codon 
and downstream from the termination codon. The encoded protein having 368 amino acid 

100 



residues is presented using the one-letter code in Figure 21. The protein encoded by clone 
CG57051-05 is predicted by the PSORT program to be located extracellularly with a certainty of 
0.7332 and has a signal peptide (see Table 28 below). The PCR product derived by exon linking, 
covering the entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to 
5 provide clone 157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 867 of 1064 bases (81%) identical to a gb:GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
10 protein PP1 158 mRNA, complete cds) (See Table 24). The full amino acid sequence of the 
protein of the invention was found to have 185 of 192 amino acid residues (96%) identical to, 
and 185 of 192 amino acid residues (96%) similar to, the 406 amino acid residue 
ptnr:SPTREMBL-ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE 
PROTEIN PP1 1 58) (See Table 25). 

15 A multiple sequence alignment is given in Table 27, with the protein of the invention 

being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin, 
missing exon 4, as indicated in positions 183 to 221 and with SNPs: V156G, A157G, T266M. 

The presence of identifiable domains in the protein disclosed herein was determined by 
20 searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 

fibrinogenC 1/2 184 246.. 47 123.. 98.2 4e-27 
25 fibrinogenC 2/2 288 362.. 178 272.] 67.0 3.4e-18 

IPR002181; (Fibrinogen_C) 

Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
30 bonds. The N-terminal sections of these three chains are evolutionary related and contain the 

101 



cysteines that participate in the cross-linking of the chains. However, there is no similarity 
between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
5 involved in two disulfide bonds. 

xxxxCxxxxxxxxxxxxCxxjxxxaxxxxxxxxxxxxxxxxxxxxxx^ 

II ii 

!0 H + H + 

'C: conserved cysteine involved in a disulfide bond. 

(SEOIDNO:126) 

Such a domain has been recently found in other proteins which are listed below: 

15 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, 
of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 

20 differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 

4) In the C-terminus of a human protein of unknown function which is encoded on the 
opposite strand of the steroid 21-hydroxylase/complement component C4 gene locus. 

25 The function of this domain is not yet known, but it has been suggested that it could be 

involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

30 The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 

This assignment was made using mapping information associated with genomic clones, public 
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genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 



Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
5 following tissues: Adipose, Liver, Placenta. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
No.CG57051-05. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
10 in Table 28. The results predict that this sequence has a signal peptide and is likely to be 

localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
be cleaved between amino acids 25 and 26: AQG-GP. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
15 nucleic acid whose sequence is provided in Figure 21, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 21 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
20 sequence of CuraGen Acc. No. CG57051-05, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
25 modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 

30 is provided in Figure 2 1 . The invention also includes a mutant or variant protein any of whose 
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residues may be changed from the corresponding residue shown in Figure 21 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 4% of the amino acid 
residues may be so changed. 

5 Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 

10 CG57051-05 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-05 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 

15 protein include glutathione S- transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (His)6. 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
20 proteins described in the preceding paragraph. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab) 2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
25 sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

« 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 

30 location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
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protein may have important structural and/or physiological functions characteristic of the 
Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
5 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

10 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2; DNA ligase I deficiency; 

15 Glutaricaciduria, type EOB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

20 Table 24. BLASTN search using CuraGen Acc. No. CG5705 1-05. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin- like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (seq id NO:87) 
Length = 1943 



25 



30 



35 



40 



Plus Strand HSPs: 

Score = 3105 (465.9 bits), Expect = 2.0e-134, P = 2.0e-134 

Identities = 867/1064 (81%), Positives = 867/1064 (81%), Strand = Plus / 



I II II M III III I M M I M I M MM 1 1 1 III 1 1 Ml II 1 1 II I MM MM MM II 

CGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCTCC 
CAGGCTACCTAAGAGGATGAGCGGCGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 

IMII II III Ml IM II Ml III Mill I II I MM I II I II I II Ml III III 1 1 M 

CAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 
CGCCACCGCCGTGCTACTGAGCGCTCAGGGCGGACCCGTGCAGTCCAAGTCGCCGCGCTT 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IJL X JLJLXJL > "L 



Plus 




Query: 


4 


Sbjct: 


97 


Query: 


64 


Sbjct: 


157 


Query: 


124 


Sbjct: 


217 


Query: 


184 


Sbjct: 


277 



45 urn inn iii ii i urn inn milium him i iiiiiiiiiii i 

;gtcctgggacgagatgaatgtcctggcgcacggactcctgcagctcggccag( 

105 





Query: 


244 


5 


Sbjct: 


337 


Query: 


304 




Sbjct: 


397 


10 


Query: 


364 




Sbjct: 


457 


15 


Query: 
Sbjct: 


424 
517 




Query: 


484 


20 


Sbjct: 


577 




Query: 


544 


25 


Sbjct: 
Query: 


637 
603 




Sbjct: 


696 


30 


Query: 


655 




Sbjct: 


756 


35 


Query: 
Sbjct: 


714 
815 




Query: 


773 


40 


Sbjct: 


866 




Query: 


831 


45 


Sbjct: 
Query: 


922 
885 




Sbjct: 


982 


50 


Query: 


942 






1038 


55 


Query: 
Sbjct: 


1000 
1092 




Query: 


1059 


60 


Sbjct: 


1146 



GCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGAGCGGCGCCTG AGCGC 303 

II || || | Ml III III II II IIII II lllllllll II II II II I HIM II Ml I III II _ 

GCGCG AACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGAGCGGCGCCTGAGCGC 396 

GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 363 

iiii in ill ill iiiii iii ill ii liiiiiiii ii ii ii iii inn ii i ii i iii ii _ 

GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 456 
CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGC AG 423 

IIMIIIIIMMM III III IIIIIIIMI II INI I Ml 1 1 1 1 Nil 1 1 1 1 1 1 Mill 

CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGCAG 516 
GATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTGGAGAAGCAGCACCT 483 

ii mill iii ii mi i in in n iininii n ii ii ii iiiiii 1 1 in i inn m 

GATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTGGAGAAGCAGCACCT 57 6 
GCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGA 543 

Ml MMM M IIIIII Ml IMM MMIIIMIII 1 1 1 IMM I II III I M II 1 1 

GCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGA 636 
GGGTGGC-AAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGG 602 

ii mi ii inn iiiiii inn inn ii ii 1 1 n ii iiiiii iiiini i in n _ 



CTCACAATGTCAGCCGCCTGCACCA- 

Mil IIIIII Mill IIIIII IN 



II 



II 



-TGG--AGGC-TGGACAGTAA-T-TCAGAGGC-G 654 
II III I I III I I I III I 



II I II IIIII I 



II III II II I III I IIIII III II II 



I 

-A- 



II I II I 

-GGCG-CCAC 865 



I II III I I II I III IIIIII I I II IIIII II I 

GATGGCTCAGTGGACTT - CAAC — CGGCCCTGGG AAGCCTAC AAGGCGGGGTT - TGGGGA 

TCTCCGTG-C-AC — CTGGGTGGCGA-GGAC ACGGCCTATAGCCTG - CAGCTCACTGCAC 

ii ii 111 iiii iii iii i ii inn in ii in 



884 



I I III IIII II I III II IIII 

CAGCCGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAAC- 



III IIIII I 

-GCCGAGT- TGC - TGCAGT 1037 



iiiii ii iii ii i 111 iiii i ii mi ii i n 



i i i in ii ii in ii ii ii i m ii ii inn i i i i 



ii i in 



65 



70 



75 



Score = 3048 (457.3 bits), Expect = 7.4e-132, P = 7.4e-132 

Identities = 658/699 (94%), Positives = 658/699 (94%), Strand = Plus / Plus 



Query: 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



541 



754 



TGAGG-GTGGCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACC 599 

II II I IIII I II I I I I I III 11 I I I IIII II 

TGGGGAGAGGCA-GAGTGGACTATTTGAAATCCAGCCTCAGGGGTCTCCGCCATTTTT-- 810 



600 CGGCTCACAATGTCAGCCG-CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCAC 

II I I I II II I III II II I II II I M I 1 1 1 1 1 1 1 1 1 M I! M M I M 

811 -GG - TGA- ACTGCAAGATGACCT -CAG- ATGGAGGCTGGACAGTAATTC AGAGGCGCCAC 



658 



865 



718 



659 GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTAC AAGGCGGGGTTTGGGGATCCC 

niiiiiiiinniiniiiniiiiiiniiniiiiiniiiniiiiiiiiinii 

866 GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCC 925 
719 CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGC ATCATGGGGGACCGCAACAGC 778 

MM HUM Mill MM II IIIIII Mill II INI UN II I III I II III 1 1 II c 

926 CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGC 985 

106 





Query: 


779 


5 


Sbjct : 


986 




Query: 


839 




Sbjct: 


1046 


10 


Query: 


899 




Sbjct: 


1106 


15 


Query: 


959 




Sbjct: 


1166 




Query: 


1019 


20 


Sbjct: 


1226 




Query: 


1079 


Zd 


Sbjct: 


1286 




Query: 


1139 




Sbjct: 


1346 


30 


Query: 


1199 




Sbjct: 


1406 



CGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTG 838 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 M II 1 1 1 1 II I II 1 1 1 1 1 1 II II 1 1 1 II 1 1 1 1 II I 

CGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTG 1045 
CACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTC ACTGCACCCGTGGCCGGCCAG 898 

1 1 1 1 II II I II! Mill MM III III I MM 1 1 II 1 1 1 II ! 1 1 II 1 1 1 II 1 1 II II II 

CACCTGGGTGGCGAGGAC ACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAG 1105 
CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAG 958 

III 1 1 1 i I M II 1 1 1 II 1 1 M MIMI III III III III Mill Mill MM MMM 

CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAG 1165 
GATCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTT 1018 

MM 1 1 1 1 1 1 III I III Ml 1 1 1 1 Ml III III 1 1 1 II I II II 1 1 1 I II MIMI 1 1 1 1 

GATCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTT 1225 
GGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGG 1078 

I Ml 1 1 1 1 M I MM 1 1 IMIIMI II III III I M 1 1 1 1 1 1 M II I II II 1 1 1 1 1 1 1 

GGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGG 1285 
CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1138 

II M M M M II Ml II I MIMMI IMMIM I M 1 1 1 1 1 1 M I! M I M 1 1 1 1 MM 

CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1345 
GCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGG 1198 

II MM I M I Ml 1 1 III II Ml I II II I II II 1 1 1 1 1 M II 1 1 1 1 1 I M MM M 1 1 1 1 

GCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGG 1405 
GCCTGGTCCCAGGCCCACGAAAGA-GGTGACTCTTGGCTCTG 1239 

Ml Ml II I M Ml I III 1 1 1 1 1 1 IIIIIIIIIIIIIIIM 

GCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTGGCTCTG 1447 



35 



Table 25. BLASTP search using the protein of CuraGen Acc. No. CG57051-05. 

>ptnr : SPTREMBL-ACC : Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Human) , 406 aa. (seq id no:88) 
Length =406 



40 Score = 1015 (357.3 bits), Expect = 1.6e-197, Sum P(2) = 1.6e-197 
Identities = 185/192 (96%), Positives = 185/192 (96%) 



45 



50 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



177 NVSRLHHGGWWIQRRHIXJSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSIMGDRNSRLA 236 

I Ml II Ml Ml IM MMMIMIMI I II 1 1 M Ml I II MM IMIIMI 

215 NCKMTSIX^WWIQRRHIX5SVDFNRPWEAYKAGFGDPHGEFWI^LEKVHSITGDRNSRLA 274 



237 



296 



VQLRDWTCNAELLQFSVHIXXSEDTAYSLQLTAPVAGQI^ATTVPPSGLSVPFSTWDQDHD 

1 1 1 1 1 1 i 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 M 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

275 VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 



297 



356 



LRRDKNCAKS LSGGWWFGTCS HSNLNGQYFRS I PQQRQKLKKGI FWKTWRGRYYPLQATT 

II MM II M I IM I M I IM IMM MMI Ml M I II M 1 1 1! II M I II 1 1 1 1 II M 

335 LRRDKNCAKS LSGGWWFGTC S H SNLNGQYFRS I PQQRQKLKKG I FWKTWRGRYY P LQATT 394 



55 Query: 357 
Sbjct: 395 



MLIQPMAAEAAS 368 

I I I I I I I I I II I 
MLIQPMAAEAAS 406 



Score = 923 (324.9 bits), Expect = 1.6e-197, Sum P(2) = 1.6e-197 
60 Identities = 180/182 (98%), Positives = 180/182 (98%) 



65 



70 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 

i i I f M M 1 1 1 1 1 1 M I ! II M 1 1 II M I i I M M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 M I 

1 MSGAPTAGAAIJ4LCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 



61 



60 



60 



120 



RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

M M 1 1 II M 1 1 1 1 1 1 It 1 1 1 M 1 1 1 1 IE I M M 1 1 1 1 1 II 1 1 II 1 1 1 M II 1 1 1 1 1 1 1 1 

61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 
121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEGGKPARRKRLPEMAQPVDPAHNVSR 180 

107 



Sbjct: 



llllll Ml II 1 1 II II MM Mill IIMIIIII MM MM M Ml III MUM I 

121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 



Query: 181 LH 182 
II 

Sbjct: 181 LH 182 

Table 26. BLASTN identity search of CuraGen Corporation's Human SeqCalling database using CuraGen 
Acc. No. CG57051-05. 



10 >s3aq:217939973 , 631 bp. 

Length = 631 



(SEQ ID NO:89) 



Minus Strand HSPs: 



15 Score = 2620 (393.1 bits), Expect = 9.1e-113, P = 9.1e-113 

Identities = 526/527 (99%), Positives = 526/527 (99%), Strand 



Minus / Plus 



20 



25 



30 



35 



40 



45 



50 



Query: 


1239 


Sbjct: 


105 


Query: 


1180 


Sbjct: 


165 


Query: 


1120 


Sbjct: 


225 


Query: 


1060 


Sbjct: 


285 


Query: 


1000 


Sbjct: 


345 


Query: 


940 


Sbjct: 


405 


Query: 


880 


Sbjct: 


465 


Query: 


820 


Sbjct: 


525 


Query: 


760 


Sbjct: 


585 



CAGAGCCAAGAGTCACC-TCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1181 

llliliiilllllllll 1 1 1 1 1 1 1 1 1 1 : M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 til 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1 64 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

llllll Mill 1 1 1 III I II I II llllll INI llllll Mil III llllll I 

GGCTGCCTCTGCTGCCATGGGCTGGATCAAC ATGGTGGTGGCCTGCAGCGGGTAGTAGCG 224 
GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1061 

llllll lllllllllllilllllllllllllllllllllllllllll NIIIMIIIIII 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 284 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACC AGCCTCCAGA 1001 

IIIIIIIIIIIIIIIIIMIIIIIII lllllll llllllllllll lllll lllllllll 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 344 



GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 

llliliiilllllllll lllll IIIIIMIIIIIIIIIIIIIIIII llllllllllll 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 



941 



404 



881 



GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 

M M I I Ml 1 1 1 1! 1 1 II Ml 1 1 III MM M MM M MM IM M M Ml M M M I 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 464 



IIIIIIIIIIIIIMIIIIIIIIII llllllilllllllllllllllllllllll 



CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 

llllll I Mil I llllll lllll I IIIM I IIIIIIIIIIMIII II llllll llllll 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 
GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGAT 714 

1 1 1 IE II I M 1 1 1 1 1 1 1 < I ! 1 1 ^ I i 1 1 1 lllllllll 1 1 1 1 1 1 1 1 1 

GATGCTATGC ACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGAT 631 



761 



584 



55 >s3aq:230121563 , 788 bp. (seq id NO:90) 
Length = 788 



Minus Strand HSPs: 

60 Score = 2583 (387.6 bits), Expect = 3.4e-lll, P = 3.4e-lll 

Identities = 533/548 (97%), Positives = 533/548 (97%), Strand = Minus / 



Plus 



Query: 1239 CAGAGCCAAGAGTCACC-TCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1181 

II II! Ill I III Mill lllll IIIIIMIIIIIIIIIIIIIIIII III III III II 1 1 _ 

65 Sbjct: 171 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 230 



108 





Query: 


1180 




Sbjct: 


231 


5 


Query: 


1120 




Sbjct: 


291 


10 


Query: 
Sbjct: 


1060 
351 




Query: 


1000 


15 


Sbjct: 


411 




Query: 


940 


20 


Sbjct: 
Query: 


471 
880 




Sbjct: 


531 


25 


Query: 


820 




Sbjct: 


591 


30 


Query: 
Sbjct: 


760 
651 




Query: 


700 


35 


Sbjct: 


710 



GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

HUM Mill II II INI MINI I IIMII M IN I MINN IMIMII HIM II an 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 290 



II 



III 



Mini 



iiiimmi iiiiiiimiiiiimimiiiii 



iMiimiimii mum ii mi MiiMiiiiii mm inn 



GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 

1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 11 1 1 M 1 1 II Ill 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 
GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 
AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 

1 1 1 1 1 M I II 1 1 1 II III 1 1 III I II 1 1 1 1 1 1 II I II 1 1 II I i 1 1 II III II II I ill I 

AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 
CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 

I M 1 1 1 1 1 M 1 1 1 1 1 ! II i I F M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II II I I 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGT 
GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGC 

1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ; i J 1 1 1 1 1 1 i I ill I I II I I 

GATGCTATGC ACCTTCTCCAGACCCAGCCAGAACTCGCC - TGGAGTGGGAGAGGCCACTC 



I II I I I I 



941 
470 



881 



530 



821 



590 



761 



650 



701 



709 



40 



45 



55 



60 



65 



>s3aq:217940431 Category E: , 530 bp. (seq id NO:9l) 
Length = 530 
Minus Strand HSPs: 
Score = 1795 (269.3 bits), Expect = 2.0e-75, P = 2.0e-75 

Identities = 381/399 (95%), Positives = 381/399 (95%), Strand = Minus / Plus 
Query: 553 CTTGCCACCCTCATGGTCTAGGTG-CTT- GTGGTCCAG - GAGGCC AAACTGGCTTTGCAG 497 

II I I II III I II I III II II HIM INI I II Mill 

CTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTTGCAG 



Sbjct: 



132 



189 



Qu ery : 496 ATGCTGAATTCGC AGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Sb j C t : 190 ATGCTGAATTCGC AGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 



437 



249 



50 Query: 436 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 377 

lllllllllllllllllllllllllllllllllllllllllllillllllllllllllll 

250 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 309 



Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



376 



317 



AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 

III II II III IIMII III I III III III III I III III III III I II I III MM II 1 1 

310 AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 369 



316 GGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTC 

II III 1 1 1 1 II II II III II 1 1 MM I I 1 1 1 1 1 1 1 1 1 II M I II III II I II 1 1 IN 

370 GGCGGACCCGCACGCGCTCAGGCGCCGTTTC AGCGCGCTCAGCTGACTGCGGGTGCGCTC 



257 



429 



256 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTC 197 

I M 1 1 M 1 1 M 1 1 M M I II 1 1 1 1 1 M 1 1 1 M I M ! I M I i 1 1 1 1 1 M I II II M II I M 

430 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTC 489 
196 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 156 

1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I ! 1 1 1 1 1 1 1 1 1 1 1 

490 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 530 



70 >s3aq:217940613 , 336 bp. (seq id no:92) 
Length = 336 



Minus Strand HSPs: 
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Score = 995 (149.3 bits). Expect = 9.4e-56, Sum P(2) = 9.4e-56 

= 203/204 (99%), Positives = 203/204 (99%), Strand = Minus 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 567 

Mill III II Mill MINI II1IIIIIMMI Mill IMIIillimiMIMMI 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 192 
TTCTTCGGGCAGGCTTG - CCACCCTC ATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAAC 508 

10 MM III IMIIII II Mill MIIMIMI Mill II Ml Mill IMIMIMM 



20 



35 



40 



55 



Identities 


Query: 


626 


Sbjct: 


133 


Query: 


566 


Sbjct: 


193 


Query: 


507 


Sbjct: 


252 


Query: 


447 


Sbjct: 


312 


Score 


= 410 



TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCC AGGTGCCGCTGCTGCTGGGCC 448 

IMIIIIMMI IIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIMIMIIIIIIIII 

15 Sbjct: 252 TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC 311 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 i 1 1 

ACCTTGTGGAAGAGTTGCTGGATCC 336 

(61.5 bits), Expect = 9.4e-56, Sum P(2) = 9.4e-56 (sbq id NOil29) 
Identities = 86/91 (94%), Positives = 86/91 (94%), Strand = Minus / Plus 



Query: 717 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 658 

25 || || IN I II II I II II MM II III MM Ml M II II IN II II MM HI M Ml II 

Sbjct: 1 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 60 

Query: 657 TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 627 

llllll II II I II III I Ml II II I I 

30 Sbjct: 61 TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



>s3aq:217939964 , 328 bp. (SEQ ID NO:93) 
Length = 328 

Plus Strand HSPs: 



Score = 762 (114.3 bits), Expect = 1.5e-28, P = 1.5e-28 

Identities = 156/159 (98%), Positives = 156/159 (98%), Strand = Plus / Plus 
AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 1141 

II IMMIII III II I II II I III llllll I II II 1 1 Mill III II III Ml II II II I 

AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 6 0 
45 Query: 1142 ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 1201 

* II MM MIIMIMI II Mill Ml MM III II III II III II IMMIII II II III 

ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 120 
TGGTCCCAGGCCCACGAAAGA-GGTGACTCTTGGCTCTG 1239 

50 " " IMIIIIMMI IMMIII IMIIII IMMIII I 

TGGTCCCAGGCCAACGAAAGACGGTGACTCTTGGCTCCG 159 



Query: 


1082 


Sbjct: 


1 


Query: 


1142 


Sbjct: 


61 


Query: 


1202 


Sbjct: 


121 



Table 27. ClustalW alignment of CG57051-05 protein with related proteins. 



Information for the ClustalW proteins: 



Accno 

CG57051-05 (SEOIDNO:53) 



Common Name 

novel Angiopoietin-Iike protein 

110 



Length 

368 



CG5705 1 -0 4 (SEOIPNO:51) 
CG5705 1 -02 (SEOIPNO:55) 
Q9HB V 4 (SEP IP NO:80> 



Angiopoietin-like protein- isoform 4 
Angiopoietin-like protein- isoform 2 
ANGIOPOIETIN-LIKE PROTEIN PP1 158. 



242 
386 
406 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
5 without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 28. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051-05. 

1 0 outside — Certainty=0.7332( Affirmative) < suco 

microbody (peroxisome) — Certainty=0.2608(Affirmative) < suco 
endoplasmic reticulum (membrane) — Certainty=0. 1000( Affirmative) < suco 
endoplasmic reticulum (lumen) — Certainty=0.1000( Affirmative) < suco 

15 

Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max.C 31 0.306 0.37 NO 

max. Y 26 0.429 0.34 YES 
20 max. S 8 0.952 0.88 YES 
meanS 1-25 0.848 0.48 YES 

# Most likely cleavage site between pos. 25 and 26: AQG-GP 

SECP 17 

25 A SECP17 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO:54) and encoded polypeptide sequence (SEQ ID NO:55) of clone 

CG57051-02 directed toward novel Angiopoietin-like proteins and nucleic acids 

encoding them. Figure 22 illustrates the nucleic acid sequence and amino acid sequences 

respectively. This clone includes a nucleotide sequence (SEQ ID NO:54) of 1315 bp. The 

30 nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 

codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 1313-1315. 

Putative untranslated regions, if any, are found upstream from the initiation codon and 

downstream from the termination codon. The encoded protein having 386 amino acid residues is 

presented using the one-letter code in Figure 22. The protein encoded by clone CG57051-02 is 

35 predicted by the PSORT program to be located extracellularly with a certainty of 0.7332 and has 
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a signal peptide (see Table 33 below). The PCR product derived by exon linking, covering the 
entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to provide clone 
157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. SeqCalling 
procedures were also utilized to identify CG57051-02, and the following public components 
5 were thus included in the invention: gb_accno: AC010323 Homo sapiens chromosome 19 clone 
CTD-2550O8, WORKING DRAFT SEQUENCE, 55 unordered pieces. In addition, the 
following Curagen Corporation SeqCalling Assembly ID's were also included in the invention: 
162377751. The DNA and protein sequences for the novel Angiopoietin-like gene are reported 
here as CuraGen Acc. No. CG5705 1-02. 

10 Similarities 

CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:[ H]50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 

15 codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 

20 0.8200, and appears to be a signal protein (see Table 27 below). 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 696 of 700 bases (99%) identical to a gb:GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PP1 158 mRNA, complete cds) (Table 29). The full amino acid sequence of the protein 
25 of the invention was found to have 179 of 182 amino acid residues (98%) identical to, and 180 of 
182 amino acid residues (98%) similar to, the 406 amino acid residue ptnr:SPTREMBL- 
ACCQ9NZU4 protein from Homo sapiens (Human) (HEPATIC ANGIOPOIETIN-RELATED 
PROTEIN) (Table 30). 

A multiple sequence alignment is given in Table 32, with the protein of the invention 
30 being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 
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The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

5 IPR002181: Fibrinogen [1] , the principal protein of vertebrate blood clotting is an 

hexamer containing two sets of three different chains (alpha, beta, and gamma), linked to each 
other by disulfide bonds. The N-terminal sections of these three chains are evolutionary related 
and contain the cysteines that participate in the cross-linking of the chains. However, there is no 
similarity between the C-terminal part of the alpha chain and that of the beta and gamma chains. 

10 The C-terminal part of the beta and gamma chains forms a domain of about 270 amino-acid 
residues. As shown in the schematic representation this domain contains four conserved 
cysteines involved in two disulfide bonds. 

'C: conserved cysteine involved in a disulfide bond, (seoid nq:126) 

15 Such a domain has been recently found [2] in other proteins which are listed below. 

Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, of 
about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. In the C- 
terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the regulation of 
neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells differentiation. In the 
20 C-terminus of a mammalian T-cell specific protein of unknown function. In the C-terminus of a 
human protein of unknown function which is encoded on the opposite strand of the steroid 21- 
hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested [2] that it could 
be involved in protein-protein interactions. 

25 This indicates that the sequence of the invention has properties similar to those of other 

proteins known to contain this/these domain(s) and similar to the properties of these domains. 
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Chromosomal informati n: 



The Angiopoietin-like gene disclosed in this invention maps to chromosome 19ql3.3. 
This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
5 Corporation's Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: adipocytes. Expression information was derived from the tissue sources of the 
sequences that were included in the derivation of the sequence of CuraGen Acc. No. CG57051- 
10 02. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 33. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
the nucleus, the protein of CuraGen Acc. No. CG57051-02 predicted here is similar to the 
15 Angiopoietin family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding an Angiopoietin-like protein includes 
the nucleic acid whose sequence is provided in Figure 22, or a fragment thereof. The invention 

20 also includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 22 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-02, including nucleic acid fragments that are 

25 complementary to any of the nucleic acids just described. The invention additionally includes 

nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of non-limiting example, modified 

bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 

modifications are carried out at least in part to enhance the chemical stability of the modified 

30 nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
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therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 1% of the bases may be so changed. 



The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 22. The invention also includes a mutant or variant protein any of whose 
5 residues may be changed from the corresponding residue shown in Figure 22 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 

Antibodies 

10 The invention further encompasses antibodies and antibody fragments, such as Fab, 

(Fab) 2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 

15 surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 

20 Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 

potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 

25 molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
30 invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
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colon cancer, DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION,3-methylglutaconicaciduria, type EQ; Cone-rod retinal 
dystrophy-2;DNA Iigase I deficiency; Glutaricaciduria, type IIB;Liposarcoma; Myotonic 
dystrophy as well as other diseases, disorders and conditions. 

5 These materials are further useful in the generation of antibodies that bind 

immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

Table 29. BLASTN search using CuraGen Acc. No. CG5705 1 -02. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin-like protein 
10 PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (seq id N0:94) 

Length = 1943 

Plus Strand HSPs : 

15 Score = 3448 (517.3 bits), Expect = 8.3e-233, Sum P(2) = 8.3e-233 

= 696/700 (99%), Positives = 696/700 (99%), Strand = Plus / Plus 

GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 61 

II I II II Ml I II IIMMII INI 1 1 II III II 1 1 1 III II IN II I II I II I II II II 

20 Sbjct: 20 GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 79 
TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 12 1 

II III II II 1 1 II Ml II III Mill I INI Ml I II I IMMM I II II I II III Mil 

TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCG AGAGT 139 



25 



30 Query: 182 GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCT-AGATCTGGACCCGTGCA 240 

II IM M II 1 1 II Ml II Ml M I II I M M Ml I II I MM II I III 1 1 II II II 

GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC -GGACCCGTGCA 258 
GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 

35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 318 



45 



Identities 


Query: 


2 


Sbjct: 


20 


Query: 


62 


OJJ J v» u » 


80 


Query: 


122 


Sbjct: 


140 


Query: 


182 


Sbjct: 


200 


Query: 


241 


Sbjct: 


259 


Query: 


301 


Sbjct: 


319 


Query: 


361 


Sbjct : 


379 


Query: 


421 


Sbjct: 


439 


Query: 


481 


Sbjct: 


499 


Query: 


541 


Sbjct: 


559 


Query: 


601 



CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA .199 



GC AGCTCGGCCAGGGGCTGCGCGAAC ACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 

I I I I II i I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M I 

40 Sbjct: 319 GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 378 



GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIII 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 438 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 498 



50 Query: 481 CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CAAGGCTCAGAACAGCAGGATCC AGC AACTCTTCCACAAGGTGGCCC AGCAGCAGCGGCA 558 

CCTGGAGAAGCAGCACCTGCGAATTC AGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 600 

55 " 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 
CAAGCACCTAGACCATGAGGTGGCCAAACCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 
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Sbjct : 



iiiiiiiiiiiiiiiiiiiiiiiini i iiiiMii ii iiiitiiiiiiiii i mi ii 

619 CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 678 



10 



30 



35 



45 



50 



65 



Query: 661 CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 701 

lllllllllllllllllllll Mill MMIIIMIIIIM 

Sbjct: 679 CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 719 

Score = 1887 (283.1 bits). Expect = 8.3e-233, Sum P(2) = 8.3e-233 
Identities = 399/415 (96%), Positives = 399/415 (96%), Strand = Plus / Plus 

CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAATGGACTTCAA 753 

III II llllllllllllllllllllf llllllllllllllllllll llllllllll 

CCT-CAG-ATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 885 



15 Query: 754 CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 813 

IMIIIIIIIIIIIIMIIIMIIIMIIIIIII IIIMIIIIIIIIIMIIIIIMIII 

CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG AGTTCTGGCTGGG 945 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 873 

20 " | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 1005 

GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 933 

1 1 r 1 1 1 1 f 1 1 1 » 1 1 f 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 i 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 ) 1 1 r 1 1 

25 Sbjct: 1006 GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 1065 
GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 993 

IIIIIIIIMIIIIlllllli II II I II IIIIMII IIIIIIIIMIIM llllllllll 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 1125 



Query: 


694 


Sbjct: 


828 


Query: 


754 


Sbjct: 


886 


Query: 


814 


Sbjct: 


946 


Query: 


874 


Sbjct: 


1006 


Query: 


934 


Sbjct : 


1066 


Query: 


994 


Sbjct: 


1126 


Query: 


1054 


Sbjct: 


1186 


Score 


= 936 


Identities 


Query: 


909 


Sbjct: 


993 


Query: 


969 


Sbjct: 


1044 


Query: 


1025 


Sbjct: 


1103 


Query: 


1083 


Sbjct: 


1155 


Query: 


1136 


Sbjct: 


1214 


Query: 


1196 


Sbjct: 


1274 


Query: 


1256 


Sbjct: 


1334 



ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1053 

II niMIII IMIIIII III II III II lllllllll IIIIMIIMIII llllllll II 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1185 



II: I! I MM Nil II 1 1 1 1 1 II I I Mill I Mill I llll 



,-10.4 bits), Expect = 6.1e-190, Sum P(2) = 6.1e-190 
40 Identities = 312/407 (76%), Positives = 312/407 (76%), Strand = Plus / Plus 



CCGTGCACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCG 968 

mini in i i i in in I in I II I Mill I in 

CCGTGCAGCTGCGGGACTGGGAT — GGCA- AC-GCC-G- AGTTG-CTGCAGTTCT- -CCG 1043 
GCCAGCTGGGCGCC- ACCAC-CGTCCCAC - -CCAGCGGCCTCTCCGTACCCTTCTCCACT 1024 

II Mill I I I II II II I II II II II I I llll I II 



I I II I II III III II I I II I II I III III II 



PCGGT GGCTCAAAGACCTGACCATGTTCCCT — CTCC-CCT-GACCCCGGCAGGA 1135 

55 " | | || || Ml HIM || I I II I II II II I III 

CTTGGGACCAGGATCAC -GACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGA 1213 

GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1195 

II 1 1 Ml II I llll II llll II HIM llll Mill MUM HIM HIM IIIIMII 

60 Sbjct: 1214 GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1273 

CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCT AC 1255 

1 1 1 1 1 II II M 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M I 

CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTAC 1333 



TACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1315 

M 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i 1 1 II II II II M 1 1 

TACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1393 
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Table 30. BLASTP search using the protein of CuraGen Acc. No. CG57051-02. 

>ptnr:SPTREMBL-ACC:Q9NZU4 HEPATIC ANGIOPO I ETIN- RELATED PROTEIN - Homo sapiens 
(Human), 406 aa. (seq id no : 95) 
Length = 406 

Score = 919 (323.5 bits), Expect = 4.9e-194, Sum P(3) = 4.9e-194 
Identities = 179/182 (98%), Positives = 180/182 (98%) 

MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

II III I II II II II I II II II 11 + 1 1 Mil 1 1 III II I II II I II III 1 1 1 1 1 1 1 1 II 

MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 
RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRTOPEVLHSLQTQLKAQNSRIQQLF 120 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

RTRSQLS ALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 
HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

MM 111 III !l II! lillll II 1 1 I III III Mill I II II lllllll II 1 1 1 1 1 1 1 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPPHNVSR 180 

LH 182 
LH 

LH 182 

(235.9 bits), Expect = 4.9e-194, Sum P(3) = 4.9e-194 
= 123/132 (93%), Positives = 124/132 (93%) 

NVSRLHHGGWWIQRRHIX3SMDFNRPWEAYKAGFGDPHGEFWLGLEKVHSITGDRNSRLA 23 6 

I MMIMI 1 1 1 1 MMM I II Ml 1 1 1 II I M MM MM 1 1 1 llllllll 

NCKMTSDGGWTVIQRRHrX5SVDFNRPWEAYKAGFGDPHGEFWI/5LEKVHSIMGDRNSRLA 274 
VQLRDWDGNAELLQFSVllIX^EOTAYSI^LTAPVAGQIXSATTVPPSGLSVPFSTWDQDHD 296 

II I Ml Ml 1 1 Ml 1 1 II III 1 1 1 1 111 1 1 II III 1 1 1 1 1 1 II I MM M 1 1 1 M II I 

VQLRDWDGNAELLQFSVlil^EOTAYSLQFTAPVAGQl^ATTVPPSGLSVPFSTWDQDHD 334 
LRRDKNCAKSLS 308 

illinium 

LRRDKNCAKSLS 346 

(116.5 bits), Expect = 4.9e-194, Sum P(3) = 4.9e-194 
= 59/61 (96%), Positives = 60/61 (98%) 

AGGWWGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYTPLQATTMLIQPMAAEAA 385 

-M MM Ml II MM Ml III I II M Ml I III II I II II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

SGGWWFGTCS HSNLNGQYFRS I PQQRQKLKKGI FWKTWRGRYYSLQATTMLIQPMAAEAA 405 



(16.2 bits), Expect = 5.9e-33, Sum P(2) = 5.9e-33 
= 14/40 (35%), Positives = 19/40 (47%) 



10 


Query: 


1 




Sbjct: 


1 




Query: 


61 


13 


Sbjct: 


61 




Query: 


121 


20 


Sbjct: 


121 




Query: 


181 




Sbjct: 


181 


25 


Score 


= 670 




Identities 




Query: 


177 




Sbjct: 


215 




Query: 


237 


35 


Sbjct: 


275 




Query: 


297 




Sbjct: 


335 


40 


Score 


= 331 




Identities 




Query: 


326 


45 


Sbjct: 


346 




Query: 


386 


50 


Sbjct: 


406 




Score 


= 46 




Identities = 


55 


Query: 


255 




Sbjct: 


1 




Score 


= 45 


60 


Identities : 




Query: 


1 




Sbjct: 


293 



+ I II +1 I I I I I I I I++II+ 

MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDE 40 
(15.8 bits). Expect = 7.6e-33, Sum P(2) = 7.6e-33 



MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDE 4 0 

♦ I II +1 I I I hi I l ++ ll+ 



118 
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20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Table 31. BLASTN identity search of CuraGen Corporation's Human SeqCalling 
database using CuraGen Acc. No. CG57051-02. 



>s3aq: 162377751 Category D: 
Length = 1920 

Minus Strand HSPs: 



1920 bp. (SEQ ID NO:96> 



Score = 3448 (517.3 bits). Expect = 1.5e-233, Sum P(2) = 1.5e-233 



Identities 


Query: 


701 


Sbjct: 


1221 


Query: 


641 


Sbjct: 


1281 


Query: 


581 


Sbjct: 


1341 


Query: 


521 


Sbjct: 


1401 


Query: 


461 


Sbjct: 


1461 


Query: 


401 


Sbjct: 


1521 


Query: 


341 


Sbjct: 


1581 


Query: 


281 


Sbjct: 


1641 


Query: 


222 


Sbjct: 


1700 


Query: 


162 


Sbjct: 


1760 


Query: 


102 


Sbjct: 


1820 


Query: 


42 


Sbjct: 


1880 



/ Plus 



GGTGC AGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 642 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 
TTCTTCGGGCAGGTTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACT 

IIIIMIMIIII II I IMI II II I IMIMM IM I! IM I II 1 1 1 M I II II 1 1 M I 

TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCGAAACT 
GGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCA 

II M III M 1 1 II II M 1 1 III III III IMIMM 1 1 II MIMI III Ml II Ml I 



1280 



582 



1340 



522 



1400 



MM III II M II 1 1 I IMI M MMIMMIIMM III III MUM Ml MM 



GAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGG 402 

MMMMM II IMI III MIMM IIIIIMMIMI III MIMIIMM II IM M 

GAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGG 1520 
TTCCCTGACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGC 342 

lllllllllllllllllllllllllllllllllllllllllllll Illllllll 

TTCCCTGACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTC AGCTGACTGC 1580 
GGGTGCGCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 282 

MMMMM MIMI III IMI III IIMMIIIIMMI I III Mill III MIMM 

GGGTGCGCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 1 640 
CATTCATCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCAGATCT-A 223 

MMI Mill MIMI III MIMM MMMIMMIMM III Mill II I I! I 

CATTC ATCTCGTCCC AGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCC - GCCCTGA 1699 
GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 1 63 

M Ml MMI I! II I MIMI II I M 1 1 1 1 1 Mill 1 1 1 M 1 1 1 1 1 1 1 M MMI 1 1 M 

GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 1759 
CCGCTCATCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTCGGGGACGTTGGGG 103 

HIIMIIII II IIIIMill IMMIII IIIIIIMIIM MM I! Ill II! Illl! I 

CCGCTCATCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTCGGGGACGTTGGGG 1819 



TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 !! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 



43 



1879 



Illl 



Mill II I Ml Ml III MUM II 1 1 II 



Score = 1887 (283.1 bits), Expect = 1.5e-233, Sum P(2) = 1.5e-233 (SBQ id no»130) 
Identities = 399/415 (96%), Positives = 399/415 (96%), Strand = Minus / Plus 



Query: 



Sbjct: 



1108 ATGG-T-CAGGTCTTTGAGCCACCGATGGGGCAGAGAGGCTCTTGGCGCAGTTCTTGTCC 1051 

Mil I Mill I Mill I MIMIMMMMMMMMMIMM 

700 ATGGCTGCAGGTGCCAAA-CCACC-AGCCTCCAGAGAGGCTCTTGGCGCAGTTCTTGTCC 757 



Query: 1050 CTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAAGGGTACGGAGAGGCCGCTGGGTGGG 991 

119 



IMIIIIII IIIIIMIIIIIMIMIMI 



III IMIIIIIIIII Mill 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Sbjct: 


758 


Query: 


990 


Sbjct: 


818 


Query: 


930 


Sbjct: 


878 


Query: 


870 


Sbjct: 


938 


Query: 


810 


Sbjct: 


998 


Query: 


750 


Sbjct: 


1058 


Score 


= 936 



ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 931 

IIIMM III II Mi II Ill III III III III II I II IMIIIIIMI I UN 

ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 877 
TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 871 

mini 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 ■ • ' JL JL JL JLJL Q „ 

TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 937 
AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGCACCTTCTCCAGACCC 811 

IIIIIMIIIMIIIIIMIIIIIIIIIIIIIIIIIIII IMIII II lllllll llllll 

AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGC ACCTTCTCCAGACCC 997 
AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 751 

1 1 Mill III II IMIIIIII II I II llll lllllll I II I II 1 1 llllll I II I II I II 

AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 1057 



AAGTCCATTGAGCCATCGTGGCGCCTCTGAATTACTGTCCAGCCTCCATGGTGCAGG 
lllllll i I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I I I I I II III 



694 



1112 



(140.4 bits), Expect = l.le-190, Sum P(2) = l.le-190 (sbq rp woil31) 
Identities = 312/407 (76%), Positives = 312/407 (76%), Strand = Minus / Plus 

CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAAC ATGGTGGTGGCCTGCAGCGGGTA 1256 

I llllll Mill I I! IIIMI II II! llllll III Ml II I Mil II II I Mil Hill 

CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTA 606 

GTAGCGGCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGG 1196 

lllllll I I I M t I I I II M I I I I I t I I I I i I I t I I I I I IMIM II III I III I II I II 

GTAGCGGCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGG 666 

GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 1136 

I I I I II I II I II I I I I M I I I II II I I I I I II I I I II I I I II I II I I II I I I II I II I II 

GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 726 

TCCTGCCGGGGTCAGGG- G - AGAGG- - GAACATGGTCAGGTCTTTGAGCCA CCGATG 1083 

Ml I II II II I II llll Mill I II II M I I 

TCCAGAGAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGT - GATCCTGGTCCCAAG 785 

GGGCAGAGAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGA-GGTCGTGAT-CCTGGTCCCA 1025 
II III III I M I II I I II III III III II I I _ 



II I MM I I III I II II I II M M I II Mill II 



M I Mill I II I III I III I II II I III lllllll 

897 CGGAGAAC — TGCAGGAA-CT -C -GGCGTT- -GCCATC -CCAGTCC - CGCAGCTGCACGG 947 



Query: 


1315 


Sbjct: 


547 


Query: 


1255 


Sbjct: 


607 


Query: 


1195 


Sbjct: 


667 


Query: 


1135 


Sbjct: 


727 


Query: 


1082 


Sbjct: 


786 


Query: 


1024 


Sbjct: 


838 


Query: 


968 


Sbjct: 


897 



55 



Table 32. ClustalW alignment of CG57051-02 protein with related proteins. 



Information for the ClustalW proteins: 



Accno 



Common Name 
120 



Length 



CG5705 1 02 (SEon)NO:55) 

Q9NZU 4 (SEP TP NO:95) 



novel Angiopoietin-like protein 
HEPATIC ANGIOPOffiTIN-RELATED 
PROTEIN. 



386 
406 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
5 without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 33. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051-02. 

endoplasmic reticulum (membrane) Certainty=0 . 8200 (Affirmative) < suco 

10 microbody (peroxisome) Certainty=0 . 3008 (Affirmative) < suco 

plasma membrane Certainty=0 . 1900 (Affirmative) < suco 

endoplasmic reticulum (lumen) Certainty=0 . 1000 (Affirmative) < suco 

INTEGRAL Likelihood = -4.04 Transmembrane 7-23 (4-25) 

15 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 
# Measure Position Value Cutoff Conclusion 



max. 


C 


31 


0.427 


0.37 


YES 


max. 


Y 


31 


0.473 


0.34 


YES 


max. 


S 


8 


0.952 


0.88 


YES 


mean 


S 


1-30 


0.738 


0.48 


YES 



# Most likely cleavage site between pos . 30 and 31: VQS-KS 

25 

SECP 18 

A SECP18 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:56) and encoded polypeptide sequence (SEQ ID NO:57) of clone 

CG57051-03 directed toward novel Angiopoietin-like proteins and nucleic acids 
30 encoding them. Figure 23 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:56) of 1 150 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 44-46 and ending with a TAG stop codon at nucleotides 1 148-1 150. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 



121 



downstream from the termination codon. The encoded protein having 368 amino acid residues is 
presented using the one-letter code in Figure 23. 

The protein encoded by clone CG57051-03 is predicted by the PSORT program to be 
located extracellularly with a certainty of 0.7332 and has a signal peptide (see Table 38 below). 
5 The PCR product derived by exon linking, covering the entire open reading frame, was cloned 
into the pCR2.1 vector from Invitrogen to provide clone 1 34276: :1 30294: JPPAR- 
gamma.698782. P15. The DNA and protein sequences for the novel Angiopoietin-like gene are 
reported here as CuraGen Acc. No. CG57051-03. 

Similarities 

10 In a search of sequence databases, it was found, for example, that the nucleic acid 

sequence of this invention has 837 of 1031 bases (81%) identical to a gb.GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PP1 158 mRNA, complete cds) (Table 34). The full amino acid sequence of the protein of 
the invention was found to have 184 of 192 amino acid residues (95%) identical to, and 184 of 

15 192 amino acid residues (95%) similar to, the 406 amino acid residue ptnr: SPTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 
PP1 158) (Table 35). 

A multiple sequence alignment is given in Table 37, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
20 related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 183 to 221. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 



IPR002181; (Fibrinogen_C) 

Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 

122 



cysteines that participate in the cross-linking of the chains. However, there is no similarity 
between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
5 involved in two disulfide bonds. 

(SEP ID NO: 126) 

xxxxCxxxxxxxxxxxxCxxxxxxxx^ 

II II 

10 -\ + H + 

'C: conserved cysteine involved in a disulfide bond. 
Such a domain has been recently found in other proteins which are listed below: 

15 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, 
of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 

20 differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 

4) In the C-terminus of a human protein of unknown function which is encoded on the 
opposite strand of the steroid 21 -hydroxylase/complement component C4 gene locus. 

25 The function of this domain is not yet known, but it has been suggested that it could be 

involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

30 The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 

This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 
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Tissue expression 



The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Liver, Placenta. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
5 No.CG57051-03. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 38. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
10 be cleaved at amino acid 25 and 26: AQG-GP. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 23, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 

15 corresponding base shown in Figure 23 while still encoding a protein that maintains its 

Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-03, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

20 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 

25 therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 23. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 23 while still encoding 
30 a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
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functional fragment thereof. In the mutant or variant protein, up to about 5% of the amino acid 
residues may be so changed. 



Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
5 protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-03 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 

10 polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-03 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 

15 protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (HisV 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 

Antibodies 

20 The invention further encompasses antibodies and antibody fragments, such as Fab, 

(Fab>2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 

25 surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 

location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 

protein may have important structural and/or physiological functions characteristic of the 

30 Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
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potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
5 molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
10 invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type HI; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 
Glutaricaciduria, type HB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

15 These materials are further useful in the generation of antibodies that bind 

immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

Table 34. BLASTN search using CuraGen Acc. No. CG57051-03. 

20 >gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (seq id NO: 97) 
Length = 1943 



25 



35 



40 



45 



Plus Strand HSPs: 
Score s 2967 (445.2 bits), Expect = 3.2e-128, P = 3.2e-128 

Identities a 837/1031 (81%), Positives = 837/1031 (81%), Strand = Plus / Plus 



Query: 


1 


Sbjct: 


130 


Query: 


61 


Sbjct: 


190 


Query: 


121 


Sbjct: 


250 


Query: 


181 


Sbjct: 


310 


Query: 


241 


Sbjct: 


370 



30 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 
3GCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGCGG 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIM 

3GCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGCGG 
ACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 

ftCCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 

CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCT 
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 

GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTC 

126 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Query: 


301 


Sbjct: 


430 


Query: 


361 


Sbjct: 


490 


Query: 


421 


Sbjct: 


550 


Query: 


481 


Sbjct: 


610 


Query: 


541 


Sbjct: 


670 


Query: 


597 


Sbjct: 


730 


Query: 


653 


Sbjct: 


789 


Query: 


711 


Sbjct: 


848 


Query: 


771 


Sbjct: 


897 


Query: 


823 


Sbjct: 


955 


Query: 


879 


Sbjct: 


1015 


Query: 


939 


Sbjct: 


1066 


Query: 


996 


Sbjct: 


1121 



CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 360 

II III II III I II III Mill II III I III I II I II 1 1 III Mill I III III II III II 

CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 489 
GACACAACTCAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCA 420 

1 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

GACACAACTCAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCA 549 
GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 480 

1 1 II I M M M I M I II I M II 1 1 M 1 1 M 1 1 M 1 1 II 1 1 M II M M I M M I M 1 1 M 

GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 609 
CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 540 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 669 



CGAGATGGCCCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCA--TGG— AG 596 

IIIIMMMIM IIMMIIIIMIIII IIIIIIIIMMIIIIIIIM II II 

CGAGATGGCCCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGGCTGCCCAG 



729 



I I I III I I I III I II Ml MINIM 



II 



II 



I II Mill II I II I II II I III I HIM I 



AGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCGGGACTGGG 770 

i i ii ii i ii i ii i i ii ill i i i mi 

-G-TA-ATT-CAG-A— GGCG-CCACGATGGCTCAGTGGACTT-CAAC--CGGCCCTGGG 896 
ATG ACAACGCCGAGTTGCTGCAGTTCTC-CGTGC-AC- -CTGGGTGGCGA-GGACAC 822 

ii mi ii i iii i i i i i ii ii i mi in in i 



ii mill ii ii in i i in mi in in n 



i ii i n i i i i i i in ii ii in ii i in ii 



878 



C- 1065 



II I II llll II I II I I I III II II III II II II 



i iii ii ii inn i i i i ii i iii 



55 



60 



65 



70 



Score = 2774 (416.2 bits), Expect = 1.6e-119, P = 1.6e-119 

Identities = 562/568 (98%), Positives = 562/568 (98%), Strand = Plus / Plus 

CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 642 

Ml 1 1 I II II II III llll Ml 1 1 1 1 II 1 1 1 II 1 1 1 III II 1 1 1 III III II II ! I 

CCT-CAG-ATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 885 
CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 702 

1 1 1 1 1 1 i I i 1 1 ) 1 1 i 1 1 f 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 M 1 1 1 1 1 J 1 1 1 1 f 1 1 1 1 1 1 J 

CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 945 



Query: 


583 


Sbjct: 


828 


Query: 


643 


Sbjct: 


886 


Query: 


703 


Sbjct: 


946 


Query: 


763 



TCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 

illinium miininimiiminmniiimmiiimmn 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 



762 



1005 



1 1 1 1 1 1 1 1 1 1 1 Mill MUM 



mini 

127 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 





Sbjct: 


1006 




Query: 


823 


5 


Sbjct: 


1066 




Query: 


883 


10 


Sbjct: 


1126 




Query: 


943 




Sbjct: 


1186 


15 


Query: 


1003 




Sbjct: 


1246 


20 


Query : 


1063 




Sbjct: 


1306 




Query: 


1123 


25 


Sbjct: 


1366 



GGACTGGGATGGC AACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 1065 



GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 

IIIIIIIIIIMIIIIMIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 



882 



1125 



942 



ACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 

I II 1 1 1 1 II 1 1 1 1 1 1 II 1 1 II !i 1 1 II I II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II II I 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1185 
CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 1002 

I II II II II 1 1 1 1 1 III 1 1 II Mill III II 1 1 1 III III 1 1 1 MM MM II M II III 

CAAGAACTGCGCC AAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 1245 



CCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 

MMIM Ml Mill MMII Mill Ml IMIIMI Ml III IIMMIII II I Mill 

CCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 



I II III I III I II II II II II Mill II II I III II I II II II II II III II III II I II 



1062 



1305 



II 



I IMIIMMM Mill II 



1150 



Table 35. BLASTP search using the protein of CuraGen Acc. No. CG57051-03. 

30 >ptnr : SPTREMBL-ACC : Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Human), 406 aa. (SEQ ID NO: 98) 
Length =406 

Score = 1009 (355.2 bits), Expect = 4.3e-198, Sum P(2) = 4.3e-198 
35 Identities = 184/192 (95%), Positives = 184/192 (95%) 

NVSRLHHGGWWIQRRHIXSSVDFmPWEAYKAGFGDPHGEFWLGLEKVHSITGDRNSRLA 236 

I Ml MMIIIMM MMMMMIMMMMMMMMIMMMIMI II 

NCKMTSIX^WTVIQRRHDGSVI)FNRPWEAYKAGFGDPHGEFWLGLEKVHS ITGDRNSRLA 274 



40 



55 



Query: 


177 


Sbjct: 


215 


Query: 


237 


Sbjct: 


275 


Query: 


297 


Sbjct: 


335 


Query: 


357 


Sbjct: 


395 


Score 


= 934 


Identities ' 


Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 



VQ LRDWDDNAELLQF SVHLGG EDTAY S LQ LT A PVAGQ LGATTVP P SG L SVPF PTWDQDHD 296 

MMIM II II II MM MMIM I II I M I II M I M 1 1 II II M III II 1 1 1 1 1 1 i 

VQLRDWIXSNAELLQFSraiX^EDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 



45 Query: 297 LRRDKNCAKS LSGGWWFGTCSHSNLNGQYFRS I PQQRQKLKKGIFWKTWRGRYYPLQATT 356 

I I I I I I I I I I I I I I I I I I I I II I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

LRRDKNCAKS LSGGWWFGTCSHSNLNGQYFRS I PQQRQKLKKGIFWKTWRGRYYPLQATT 394 

lt ___ J MLIQPMAAEAAS 368 

50 1 1 1 1 1 1 1 1 1 1 1 1 

MLIQPMAAEAAS 406 

(328.8 bits), Expect = 4.3e-198, Sum P(2) = 4.3e-198 
: 182/182 (100%), Positives = 182/182 (100%) 



MSGAPTAGAALMLCAATAVLLSAQGG PVQSKS PRFASWDEMNVLAHGLLQLGQGLREHAE 60 

I III II III II II MM III II II IIMMMIMI IMMMIIMMM I II III II 

MSGAPTAGAALMLCAATAVXLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 6 0 



60 Query: 61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVXHSLQTQLKAQNSRIQQLF 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

65 " II II III III II IMIIMI MINI III III Mill III MM llllllllll I III II 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

128 



Query: 
Sbjct: 



181 LH 182 
II 

181 LH 182 



5 

Table 36. BLASTN identity search of CuraGen Corporation's Human SeqCalling 
database using CuraGen Acc. No. CG57051-03. 

>s3aq: 189266374 Sequence 5 from Patent WO0105825 (AX079971.1: 100%/409, 
10 p=1.2e-238), 550 bp. (SEQ ID NO:99) 

Length =550 

Plus Strand HSPs: 

15 Score = 2723 (408.6 bits), Expect = 1.8e-117, P = 1.8e-117 

Identities = 547/550 (99%), Positives = 547/550 (99%), Strand = Plus / Plus 

GAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGAGG 509 

[ E ! 1 1 1 i r 1 1 ! 1 1 1 1 1 1 [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 f 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

20 Sbjct: 1 GAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGAGG 60 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 569 

IIIIIIMMMI IIIIIIIIIIIM llllllllllllllll III M IMIIIIIIIMI 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 120 



25 



30 Query: 630 CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG 689 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG 240 

AGTTCTGGCTGGGTCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGG 749 

35 IIIIIIIIIIIM II Mil III 1 1 1 IIIIIIIIIIIM llllllllllllllll III II 

AGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAAC AGCCGCCTGG 300 



45 



50 



Query: 


450 


Sbjct: 


1 


Query: 


510 


Sbjct : 


ol 


Query: 


570 


Sbj ct : 


l/l 


Query: 


630 


Sbjct: 


181 


Query: 


690 


Sbjct: 


241 


Query: 


750 


Sbjct: 


301 


Query: 


810 


Sbjct: 


361 


Query: 


870 


Sbjct: 


421 


Query: 


930 


Sbjct: 


481 


Query: 


990 


Sbjct: 


541 



ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 629 

M 1 1 1 M I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 180 



CCGTGCAGCTGCGGGACTGGGATGACAACGCCGAGTTGCTGC AGTTCTCCGTGCACCTGG 809 

i m m i ii 1 1 1 1 1 1 ii 1 1 1 1 1 1 M i inn 1 1 1 ii i mi m ii ii i ii 1 1 ii 1 1 1 1 ii 

40 Sbjct: 301 CCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGG 360 



GTGGCGAGGACACGGCCT ATAGCCTGC AGCTCACTGCACCCGTGGCCGGCCAGCTGGGCG 869 

1 1 II 1 1 1 1 III II II INI II Mill I MM I II IIMI 1 1 1 1 II Ml I M 1 1 1 1 II I II 

GTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCG 420 
CCACCACCGTCCO\CCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACG 929 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 

CCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACG 480 



INI lllll IIIIIIIIIIMIII M Nil Mill lllll III l! Ill lllllll! II 

ictccgcagggacaagaactgcgccaagagcctctctggaggctggtggtttggcacct 

:agccattc 999 

55 I I I II I I I II 



>s3aq: 188990257 Homo sapiens angiopoietin-related protein mRNA, complete cds 
60 (AF153606.1: 99%/476, p=1.9e-259), 652 bp. (SEQ ID NO: 100) 

Length 3 652 

Minus Strand HSPs: 

65 Score = 2403 (360.5 bits), Expect = 4.2e-103, P = 4.2e-103 

Identities = 505/523 (96%), Positives = 505/523 (96%), Strand = Minus / Plus 

129 



AGGCTTGGCCACC - TCATGGTCTAGGTG - CTT -GTGGTCCAG -GAGGCCAAACTGGCTTT 465 

II I III I II III I II I III II II Ml lllllll IIIIIMI 

AGCCCTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 185 
GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC ACCTTGT 405 

IMIIIMIIIII Mill lllllllllllll IIIIIIIIMMMIIIIIIIIIIIIIII 

GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 245 
10 Query: 404 GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 345 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 305 
CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 285 

is " mi 1 1 ii i mi i inn him linn iiiiiiiiinii limn linn inn 

CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 365 
GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 225 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 rr 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 

20 Sbjct: 366 GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 425 
GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 165 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 485 



25 



Query: 


520 


Sbjct: 


128 


Query: 


464 


Sbjct: 


186 


Query: 


404 


Sbjct: 


246 


Query: 


344 


Sbjct: 


306 


Query: 


284 


Sbjct: 


366 


Query: 


224 


Sbjct: 


426 


Query: 


164 


Sbjct: 


486 


Query : 


104 


Sbjct: 


546 


Query: 


44 


Sbjct: 


606 



TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 105 

lllllll I M 1 1 1 1 1 1 1 1 M M 1 1 II 1 1 1 Ill IIIIIMI lllllll 

TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 545 



30 Query: 104 GTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 45 

II I I lllllllllllll I II III III lllllll II III I III III III lllllll MM I 

GTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 605 

TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCT-CGGGG 1 

35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTTCGGGG 650 



>s3aq: 164987939 Category E: Homo sapiens angiopoietin-related protein mRNA, 
40 complete cds (AF153606.1: 100%/150, p=1.9e-084), 228 bp. (SEQ id NO:101) 

Length = 228 



45 



50 



55 



Minus Strand HSPs: 

(72.0 bits), Expect = 2.7e-31, Sum P(2) = 2.7e-31 
: 96/96 (100%), Positives = 96/96 (100%), Strand = Minus / Plus 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 53 2 

1 1 1 1 ll 1 1 1 1 1 II Mill 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lllllllllllll lllllll 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 192 
TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGT 495 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I ! 1 1 1 1 

TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGT 228 

410 (61.5 bits), Expect = 2.7e-31, Sum P(2) = 2.7e-31 (SEQ id no»132) 
Identities = 86/91 (94%), Positives = 86/91 (94%), Strand = Minus / Plus 



Score = 


480 


Identities 


Query: 


590 


Sbjct: 


133 


Query: 


530 


Sbjct: 


193 


Score = 


410 



Query: 681 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 622 

60 1 1 i I f 1 1 1 1 f 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 

Sbjct: 1 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 60 

Query: 621 TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 591 

I I I I I I I I I I I I I I I I II I I I I II I I 

65 Sbjct: 61 TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



130 



Table 37. ClustalW alignment of CG57051-03 protein with related proteins. 



Information for the ClustalW proteins: 

Accno Common Name Length 

CG5705 1 -03 (SEP id no:49> novel Angiopoietin-like protein 368 

Q9HBV 4 (seoipno:80) ANGIOPOIETIN-LIKE PROTEIN PP1 1 58. 406 

CG5705 1 -02 (Seoipno:55) Angiopoietin-like protein- isoform 2 386 

5 In the alignment shown above, black outlined amino acid residues indicate residues 

identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
10 similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 38. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051-03. 



outside Certainty=0. 7332 (Affirmative) < suco 

15 microbody (peroxisome) . Certainty=0 . 2527 (Affirmative) < suco 

endoplasmic reticulum (membrane) Certainty=0 . 1000 (Affirmative) < suco 

endoplasmic reticulum (lumen) Certainty=0 . 1000 (Affirmative) < suco 

20 Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max. C 31 0.306 0.37 NO 

max. Y 26 0.429 0.34 YES 

max. S 8 0.952 0.88 YES 

25 mean S 1-25 0.848 0.48 YES 

# Most likely cleavage site between pos. 25 and 26: AQG-GP 



CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
30 encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 

respectively. This clone includes a nucleotide sequence (SEQ ID NO:[ II]50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
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Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
5 0.8200, and appears to be a signal protein (see Table 27 below). Bottom of Form 

SECP Nucleic Acids 

The novel nucleic acids of the invention include those that encode a SECP or SECP-like 
protein, or biologically-active portions thereof. The nucleic acids include nucleic acids encoding 
polypeptides that include the amino acid sequence of one or more of SEQ ID NO:l, 3, 5, 7, 9, 

10 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. The encoded polypeptides can thus include, 
e.g., the amino acid sequences of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 
52, 54 and 56.1n some embodiments, a SECP polypeptide or protein, as disclosed herein, 
includes the product of a naturally-occurring polypeptide, precursor form, pro-protein, or mature 
form of the polypeptide. The naturally-occurring polypeptide, precursor, or pro-protein includes, 

15 e.g., the full-length gene product, encoded by the corresponding gene. The naturally-occurring 
polypeptide also includes the polypeptide, precursor or pro-protein encoded by an open reading 
frame (ORF) described herein. As used herein, the term "identical" residues corresponds to 
those residues in a comparison between two sequences where the equivalent nucleotide base or 
amino acid residue in an alignment of two sequences is the same residue. Residues are 

20 alternatively described as "similar" or "positive" when the comparisons between two sequences 
in an alignment show that residues in an equivalent position in a comparison are either the same 
amino acid residue or a conserved amino acid residue, as defined below. 

As used herein, a "mature" form of a polypeptide or protein disclosed in the present 
invention is the product of a naturally occurring polypeptide or precursor form or proprotein. 

25 The naturally occurring polypeptide, precursor or proprotein includes, by way of nonlimiting 
example, the full length gene product, encoded by the corresponding gene. Alternatively, it may 
be defined as the polypeptide, precursor or proprotein encoded by an open reading frame 
described herein. The product "mature" form arises, again by way of nonlimiting example, as a 
result of one or more naturally occurring processing steps as they may take place within the cell, 

30 or host cell, in which the gene product arises. Examples of such processing steps leading to a 

"mature" form of a polypeptide or protein include the cleavage of the amino-terminal methionine 
residue encoded by the initiation codon of an open reading frame, or the proteolytic cleavage of a 
signal peptide or leader sequence. Thus, a mature form arising from a precursor polypeptide or 
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protein that has residues 1 to N, where residue 1 is the amino-terminal methionine, would have 
residues 2 through N remaining after removal of the amino-terminal methionine. Alternatively, a 
mature form arising from a precursor polypeptide or protein having residues 1 to N, in which an 
amino-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues 
5 from residue M+l to residue N remaining. Further, as used herein, a "mature" form of a 
polypeptide or protein may arise from a step of post-translational modification other than a 
proteolytic cleavage event. Such additional processes include, by way of non-limiting example, 
glycosylation, myristoylation or phosphorylation. In general, a mature polypeptide or protein 
may result from the operation of only one of these processes, or a combination of any of them. 

10 In some embodiments, a nucleic acid encoding a polypeptide having the amino acid 

sequence of one or more of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 
55 and 57, includes the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 
[XX, AA, CC, EE, GG, II, KK, MM,1 40, 42. 44, 46, 48, 50. 52, 54, and[/or] [OOJ56, or a 
fragment thereof. Additionally, the invention includes mutant or variant nucleic acids of any of 

15 SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a fragment 

thereof, any of whose bases may be changed from the disclosed sequence while still encoding a 
protein that maintains its SECP-like biological activities and physiological functions. The 
invention further includes the complement of the nucleic acid sequence of any of SEQ ID NO:l, 
3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, including fragments, derivatives, 

20 analogs and homologs thereof. The invention additionally includes nucleic acids or nucleic acid 
fragments, or complements thereto, whose structures include chemical modifications. 

Also included are nucleic acid fragments sufficient for use as hybridization probes to 
identify SECP-encoding nucleic acids (e.g., SECP mRNA) and fragments for use as polymerase 
chain reaction (PCR) primers for the amplification or mutation of SECP nucleic acid molecules. 
25 As used herein, the term "nucleic acid molecule" is intended to include DNA molecules (e.g., 

cDNA or genomic DNA), RNA molecules (e.g., mRNA), analogs of the DNA or RNA generated 
using nucleotide analogs, and derivatives, fragments, and homologs thereof. The nucleic acid 
molecule can be single-stranded or double-stranded, but preferably is double-stranded DNA. 

The term "probes" refer to nucleic acid sequences of variable length, preferably between 
30 at least about 10 nucleotides (nt), 100 nt, or as many as about, e.g., 6,000 nt, depending upon the 
specific use. Probes are used in the detection of identical, similar, or complementary nucleic 
acid sequences. Longer length probes are usually obtained from a natural or recombinant source, 
are highly specific and much slower to hybridize than oligomers. Probes may be single- or 
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double-stranded, and may also be designed to have specificity in PCR, membrane-based 
hybridization technologies, or ELISA-like technologies. 

The tern "isolated" nucleic acid molecule is a nucleic acid that is separated from other 
nucleic acid molecules that are present in the natural source of the nucleic acid. Examples of 
5 isolated nucleic acid molecules include, but are not limited to, recombinant DNA molecules 
contained in a vector, recombinant DNA molecules maintained in a heterologous host cell, 
partially or substantially purified nucleic acid molecules, and synthetic DNA or RNA molecules. 
Preferably, an "isolated" nucleic acid is free of sequences which naturally flank the nucleic acid 
{i.e., sequences located at the 5 - and 3-termini of the nucleic acid) in the genomic DNA of the 

10 organism from which the nucleic acid is derived. For example, in various embodiments, the 
isolated SECP nucleic acid molecule can contain less than aprroximately 50 kb, 25 kb, 5 kb, 
4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic 
acid molecule in genomic DNA of the cell from which the nucleic acid is derived. Moreover, an 
"isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other 

15 cellular material or culture medium when produced by recombinant techniques, or of chemical 
precursors or other chemicals when chemically synthesized. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the 
nucleotide sequence of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 
56, or a complement of any of these nucleotide sequences, can be isolated using standard 

20 molecular biology techniques and the sequence information provided herein. Using all or a 

portion of the nucleic acid sequence of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 
46, 48, 50, 52, 54 and 56 as a hybridization probe, SECP nucleic acid sequences can be isolated 
using standard hybridization and cloning techniques (e.g., as described in Sambrook et al, eds., 
Molecular Cloning: A Laboratory Manual 2 nd Ed., Cold Spring Harbor Laboratory Press, 

25 Cold Spring Harbor, NY, 1989; and Ausubel, et al., eds., Current Protocols in Molecular 
Biology, John Wiley & Sons, New York, NY, 1993.) 

A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 
genomic DNA, as a template and appropriate oligonucleotide primers according to standard PCR 
amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector 
30 and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to 
SECP nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an 
automated DNA synthesizer. 
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As used herein, the term "oligonucleotide" refers to a series of linked nucleotide residues, 
which oligonucleotide has a sufficient number of nucleotide bases to be used in a PCR reaction. 
A short oligonucleotide sequence may be based on, or designed from, a genomic or cDNA 
sequence and is used to amplify, confirm, or reveal the presence of an identical, similar or 
5 complementary DNA or RNA in a particular cell or tissue. Oligonucleotides comprise portions 
of a nucleic acid sequence having about 10 nt, 50 nt, or 100 nt in length, preferably about 15 nt 
to 30 nt in length. In one embodiment, an oligonucleotide comprising a nucleic acid molecule 
less than 100 nt in length would further comprise at lease 6 contiguous nucleotides of any of 
SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a complement 
10 thereof. Oligonucleotides may be chemically synthesized and may also be used as probes. 

In another embodiment, an isolated nucleic acid molecule of the invention comprises a 
nucleic acid molecule that is a complement of the nucleotide sequence shown in any of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In still another 
embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid 

15 molecule that is a complement of the nucleotide sequence shown in any of SEQ ID NO:l, 3, 5, 7, 
9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a portion of this nucleotide sequence. 
A nucleic acid molecule that is complementary to the nucleotide sequence shown in is one that 
is sufficiently complementary to the nucleotide sequence shown in of any of SEQ ID NO:l, 3, 5, 
7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that it can hydrogen bond with litde or 

20 no mismatches to the nucleotide sequence shown in of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, thereby forming a stable duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogsteen base- 
pairing between nucleotides units of a nucleic acid molecule, whereas the term "binding" is 
defined as the physical or chemical interaction between two polypeptides or compounds or 
25 associated polypeptides or compounds or combinations thereof. Binding includes ionic, 

non-ionic, Von der Waals, hydrophobic interactions, and the like. A physical interaction can be 
either direct or indirect. Indirect interactions may be through or due to the effects of another 
polypeptide or compound. Direct binding refers to interactions that do not take place through, or 
due to, the effect of another polypeptide or compound, but instead are without other substantial 
30 chemical intermediates. 

Additionally, the nucleic acid molecule of the invention can comprise only a portion of 
the nucleic acid sequence of any of SEQIDNO:l, 3,5,7,9, 11, 13, 15, 17, [XX, AA, CC,EE, 
GG, II, KK, MM.140. 42. 44, 46, 48, 50, 52, 54, and [00,156^ e.g., a fragment that can be used 
as a probe or primer, or a fragment encoding a biologically active portion of SECP. Fragments 
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provided herein are defined as sequences of at least 6 (contiguous) nucleic acids or at least 4 
(contiguous) amino acids, a length sufficient to allow for specific hybridization in the case of 
nucleic acids or for specific recognition of an epitope in the case of amino acids, respectively, 
and are at most some portion less than a full length sequence. Fragments may be derived from 
5 any contiguous portion of a nucleic acid or amino acid sequence of choice. Derivatives are 
nucleic acid sequences or amino acid sequences formed from the native compounds either 
directly or by modification or partial substitution. Analogs are nucleic acid sequences or amino 
acid sequences that have a structure similar to, but not identical to, the native compound but 
differs from it in respect to certain components or side chains. Analogs may be synthetic or from 
10 a different evolutionary origin and may have a similar or opposite metabolic activity compared to 
wild-type. 

Derivatives and analogs may be full-length or other than full-length, if the derivative or 
analog contains a modified nucleic acid or amino acid, as described below. Derivatives or 
analogs of the nucleic acids or proteins of the invention include, but are not limited to, molecules 

15 comprising regions that are substantially homologous to the nucleic acids or proteins of the 

invention, in various embodiments, by at least about 70%, 80%, 85%, 90%, 95%, 98%, or even 
99% identity (with a preferred identity of 80-99%) over a nucleic acid or amino acid sequence of 
identical size or when compared to an aligned sequence in which the alignment is done by a 
computer homology program known in the art, or whose encoding nucleic acid is capable of 

20 hybridizing to the complement of a sequence encoding the aforementioned proteins under 

stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et al. 9 CURRENT 
Protocols in Molecular Biology, John Wiley & Sons, New York, NY, 1993, and below. 
An exemplary program is the Gap program (Wisconsin Sequence Analysis Package, Version 8 
for UNIX, Genetics Computer Group, University Research Park, Madison, WI) using the default 

25 settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2: 482- 
489), which is incorporated herein by reference in its entirety. 

The tern "homologous nucleic acid sequence" or "homologous amino acid sequence," or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as previously discussed. Homologous nucleotide sequences encode those 
30 sequences coding for isoforms of SECP polypeptide. Isoforms can be expressed in different 
tissues of the same organism as a result of, e.g., alternative splicing of RNA. Alternatively, 
isoforms can be encoded by different genes. In the invention, homologous nucleotide sequences 
include nucleotide sequences encoding for a SECP polypeptide of species other than humans, 
including, but not limited to, mammals, and thus can include, e.g., mouse, rat, rabbit, dog, cat 
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cow, horse, and other organisms. Homologous nucleotide sequences also include, but are not 
limited to, naturally occurring allelic variations and mutations of the nucleotide sequences set 
forth herein. A homologous nucleotide sequence does not, however, include the nucleotide 
sequence encoding human SECP protein. Homologous nucleic acid sequences include those 
5 nucleic acid sequences that encode conservative amino acid substitutions (see below) in any of 
SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, as well as a 
polypeptide having SECP activity. Biological activities of the SECP proteins are described 
below. A homologous amino acid sequence does not encode the amino acid sequence of a 
human SECP polypeptide. 

10 The nucleotide sequence determined from the cloning of the human SECP gene allows 

for the generation of probes and primers designed for use in identifying the cell types disclosed 
and/or cloning SECP homologues in other cell types, e.g., from other tissues, as well as SECP 
homologies from other mammals. The probe/primer typically comprises a substantially-purified 
oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 

15 hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 300, 350 or 
400 or more consecutive sense strand nucleotide sequence of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56; or an anti-sense strand nucleotide sequence of SEQ 
ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or of a naturally 
occurring mutant of SEQ ED NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 

20 56. 

Probes based upon the human SECP nucleotide sequence can be used to detect transcripts 
or genomic sequences encoding the same or homologous proteins. In various embodiments, the 
probe further comprises a label group attached thereto, e.g., the label group can be a 
radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be 
25 used as a part of a diagnostic test kit for identifying cells or tissue which mis-express a SECP 
protein, such as by measuring a level of a SECP-encoding nucleic acid in a sample of cells from 
a subject e.g., detecting SECP mRNA levels or determining whether a genomic SECP gene has 
been mutated or deleted. 

The term "a polypeptide having a biologically-active portion of SECP" refers to 
30 polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 

polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a "biologically- 
active portion of SECP" can be prepared by isolating a portion of SEQ ED NO:l, 3, 5, 7, 9, 11, 
13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that encodes a polypeptide having a SECP 



biological activity, expressing the encoded portion of SECP protein (e.g., by recombinant 
expression in vitro), and assessing the activity of the encoded portion of SECP. 

SECP Variants 

The invention further encompasses nucleic acid molecules that differ from the disclosed 
5 SECP nucleotide sequences due to degeneracy of the genetic code. These nucleic acids therefore 
encode the same SECP protein as those encoded by the nucleotide sequence shown in SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In another embodiment, an 
isolated nucleic acid molecule of the invention has a nucleotide sequence encoding a protein 
having an amino acid sequence shown in any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 
10 44, 46, 48, 50, 52, 54 and 56. 

In addition to the human SECP nucleotide sequence shown in any of SEQ ID NO:l, 3, 5, 
7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, it will be appreciated by those skilled in 
the art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of 
SECP may exist within a population (e.g., the human population). Such genetic polymorphism 

15 in the SECP gene may exist among individuals within a population due to natural allelic 
variation. As used herein, the terms "gene 1 ' and "recombinant gene" refer to nucleic acid 
molecules comprising an open reading frame encoding a SECP protein, preferably a mammalian 
SECP protein. Such natural allelic variations can typically result in 1-5% variance in the 
nucleotide sequence of the SECP gene. Any and all such nucleotide variations and resulting 

20 amino acid polymorphisms in SECP that are the result of natural allelic variation and that do not 
alter the functional activity of SECP are intended to be within the scope of the invention. 

Additionally, nucleic acid molecules encoding SECP proteins from other species, and 
thus that have a nucleotide sequence that differs from the human sequence of any of SEQ ID 
NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are intended to be within the 
25 scope of the invention. Nucleic acid molecules corresponding to natural allelic variants and 

homologues of the SECP cDNAs of the invention can be isolated based on their homology to the 
human SECP nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a 
hybridization probe according to standard hybridization techniques under stringent hybridization 
conditions. 

30 In another embodiment, an isolated nucleic acid molecule of the invention is at least 6 

nucleotides in length and hybridizes under stringent conditions to the nucleic acid molecule 
comprising the nucleotide sequence of any of SEQ ED NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 
46, 48, 50, 52, 54 and/or 56. In another embodiment, the nucleic acid is at least 10, 25, 50, 100, 
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250, 500 or 750 nucleotides in length. In yet another embodiment, an isolated nucleic acid 
molecule of the invention hybridizes to the coding region. As used herein, the term "hybridizes 
under stringent conditions" is intended to describe conditions for hybridization and washing 
under which nucleotide sequences at least 60% homologous to each other typically remain 
5 hybridized to each other. 

Homologs (Le., nucleic acids encoding SECP proteins derived from species other than 
human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or high 
stringency hybridization with all or a portion of the particular human sequence as a probe using 
methods well known in the art for nucleic acid hybridization and cloning. 

10 As used herein, the phrase "stringent hybridization conditions" refers to conditions under 

which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no other 
sequences. Stringent conditions are sequence-dependent and will be different in different 
circumstances. Longer sequences hybridize specifically at higher temperatures than shorter 
sequences. Generally, stringent conditions are selected to be about 5°C lower than the thermal 

15 melting point (T m ) for the specific sequence at a defined ionic strength and pH. The T m is the 
temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of 
the probes complementary to the target sequence hybridize to the target sequence at equilibrium. 
Since the target sequences are generally present at excess, at T m , 50% of the probes are occupied 
at equilibrium. Typically, stringent conditions will be those in which the salt concentration is 

20 less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion (or other salts) at pH 
7.0 to 8.3 and the temperature is at least about 30°C for short probes, primers or oligonucleotides 
(e.g., 10 nt to 50 nt) and at least about 60°C for longer probes, primers and oligonucleotides. 
Stringent conditions may also be achieved with the addition of destabilizing agents, such as 
formamide. 

25 Stringent conditions are known to those skilled in the art and can be found in Current 

Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3. 1-6.3.6. Preferably, 
the conditions are such that sequences at least about 65%, 70%, 75%, 85%, 90%, 95%, 98%, or 
99% homologous to each other typically remain hybridized to each other. A non-limiting 
example of stringent hybridization conditions is hybridization in a high salt buffer comprising 

30 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 
500 mg/ml denatured salmon sperm DNA at 65°C. This hybridization is followed by one or 
more washes in 0.2X SSC, 0.01% BSA at 50°C. An isolated nucleic acid molecule of the 
invention that hybridizes under stringent conditions to the sequence of any of SEQ ID NO:l, 3, 
5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 corresponds to a naturally occurring 



nucleic acid molecule. As used herein, a "naturally-occurring" nucleic acid molecule refers to an 
RNA or DNA molecule having a nucleotide sequence that occurs in nature (e.g., encodes a 
natural protein). 

In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic acid 
5 molecule comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 
40, 42, 44, 46, 48, 50, 52, 54 and/or 56, or fragments, analogs or derivatives thereof, under 
conditions of moderate stringency is provided. A non-limiting example of moderate stringency 
hybridization conditions are hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 
100 mg/ml denatured salmon sperm DNA at 55°C, followed by one or more washes in IX SSC, 
10 0.1% SDS at 37°C Other conditions of moderate stringency that may be used are well known in 
the art. See, e.g., Ausubel et al. feds.), 1993, Current Protocols in Molecular Biology, 
John Wiley & Sons, NY, and Kriegler, 1990. Gene Transfer and Expression, A Laboratory 
Manual, Stockton Press, NY. 

In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 
15 comprising the nucleotide sequence of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 
46, 48, 50, 52, 54 and 56, or fragments, analogs or derivatives thereof, under conditions of low 
stringency, is provided. A non-limiting example of low stringency hybridization conditions are 
hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.02% PVP, 
0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% (wt/vol) dextran 
20 sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCl (pH 7.4), 5 mM 
EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may be used are well 
known in the art (e.g., as employed for cross-species hybridizations). See, e.g., Ausubel, et al., 
(eds.), 1993. Current Protocols in Molecular Biology, John Wiley & Sons, NY, and 
Kriegler, 1990. Gene Transfer and Expression, A Laboratory Manual, Stockton Press, 
25 NY; Shilo and Weinberg, 1981 . Proc. Natl. Acad. Sci. USA 78: 6789-6792. 

Conservative Mutations 

In addition to naturally-occurring allelic variants of the SECP sequence that may exist in 
the population, the skilled artisan will further appreciate that changes can be introduced by 
mutation into the nucleotide sequence of any of SEQ ED NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
30 44, 46, 48, 50, 52, 54 and 56, thereby leading to changes in the amino acid sequence of the 

encoded SECP protein, without altering the functional ability of the SECP protein. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 
residues can be made in the sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 



44, 46, 48, 50, 52, 54 and 56. A "non-essential" amino acid residue is a residue that can be 
altered from the wild-type sequence of SECP without altering the biological activity, whereas an 
"essential" amino acid residue is required for biological activity. For example, amino acid 
residues that are conserved among the SECP proteins of the invention, are predicted to be 
5 particularly non-amenable to such alteration. 

Amino acid residues that are conserved among members of a SECP family members are 
predicted to be less amenable to alteration. For example, a SECP protein according to the 
invention can contain at least one domain that is a typically conserved region in a SECP family 
member. As such, these conserved domains are not likely to be amenable to mutation. Other 
10 amino acid residues, however, (e.g., those that are not conserved or only semi-conserved among 
members of the SECP family) may not be as essential for activity and thus are more likely to be 
amenable to alteration. 

Another aspect of the invention pertains to nucleic acid molecules encoding SECP 
proteins that contain changes in amino acid residues that are not essential for activity. Such 

15 SECP proteins differ in amino acid sequence from any of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 
14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, yet retain biological activity. In one 
embodiment, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a 
protein, wherein the protein comprises an amino acid sequence at least about 75% homologous 
to the amino acid sequence of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 

20 51, 53, 55 and 57. Preferably, the protein encoded by the nucleic acid is at least about 80% 

homologous to any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
57, more preferably at least about 90%, 95%, 98%, and most preferably at least about 99% 
homologous to SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57,. 

An isolated nucleic acid molecule encoding a SECP protein homologous to the protein of 
25 any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, can be 
created by introducing one or more nucleotide substitutions, additions or deletions into the 
corresponding nucleotide sequence (i.e., SEQ ID NO: 1 , 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 
48, 50, 52, 54 and/or 56), such that one or more amino acid substitutions, additions or deletions 
are introduced into the encoded protein. 

30 Mutations can be introduced into SEQ ID NO: 1 , 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 

48, 50, 52, 54 and/or 56 by standard techniques, such as site-directed mutagenesis and 
PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one 
or more predicted non-essential amino acid residues. A "conservative amino acid substitution" is 
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one in which the amino acid residue is replaced with an amino acid residue having a similar side 
chain. Families of amino acid residues having similar side chains have been defined in the art. 
These families include amino acids with basic side chains {e.g., lysine, arginine, histidine), acidic 
side chains {e.g., aspartic acid, glutamic acid), uncharged polar side chains {e.g., glycine, 
5 asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains {e.g., alanine, 
valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), P-branched side 
chains {e.g., threonine, valine, isoleucine) and aromatic side chains {e.g., tyrosine, phenylalanine, 
tryptophan, histidine). Thus, a predicted nonessential amino acid residue in SECP is replaced 
with another amino acid residue from the same side chain family. Alternatively, in another 

10 embodiment, mutations can be introduced randomly along all or part of a SECP coding 

sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for SECP 
biological activity to identify mutants that retain activity. Following mutagenesis of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56, the encoded protein can 
be expressed by any recombinant technology known in the art and the activity of the protein can 

15 be determined. 

In one embodiment, a mutant SECP protein can be assayed for: (i) the ability to form 
protein:protein interactions with other SECP proteins, other cell-surface proteins, or biologically- 
active portions thereof; (ii) complex formation between a mutant SECP protein and a SECP 
receptor; {Hi) the ability of a mutant SECP protein to bind to an intracellular target protein or 
20 biologically active portion thereof; {e.g., avidin proteins); (iv) the ability to bind BRA protein; or 
(v) the ability to specifically bind an anti-SECP protein antibody. 

Antisense Nucleic Acids 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules that 
are hybridizable to or complementary to the nucleic acid molecule comprising the nucleotide 

25 sequence of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 or 
fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a nucleotide 
sequence that is complementary to a "sense" nucleic acid encoding a protein, e.g., 
complementary to the coding strand of a double-stranded cDNA molecule or complementary to 
an mRNA sequence. In specific aspects, antisense nucleic acid molecules are provided that 

30 comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 nucleotides or 
an entire SECP coding strand, or to only a portion thereof. Nucleic acid molecules encoding 
fragments, homologs, derivatives and analogs of a SECP protein of any of SEQ ID NO:2, 4, 6, 8, 
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10, 12, 14, 16, [18. YY, BB, DD, FF, HH, JJ, LL, NN.1 18. 41, 43. 45, 47, 49, 51, 53. 55, and 
[PP.]5X 

or antisense nucleic acids complementary to a SECP nucleic acid sequence of SEQ ID 
NO:2, 4, 6, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, are additionally provided. 

5 In one embodiment, an antisense nucleic acid molecule is antisense to a "coding region" 

of the coding strand of a nucleotide sequence encoding SECP. The term "coding region" refers 
to the region of the nucleotide sequence comprising codons which are translated into amino acid 
residues (e.g., the protein coding region of a human SECP that corresponds to any of SEQ ID 
NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57. 

10 . In another embodiment, the antisense nucleic acid molecule is antisense to a "non- 

coding region" of the coding strand of a nucleotide sequence encoding SECP. The term "non- 
coding region" refers to 5 - and 3-terminal sequences which flank the coding region that are not 
translated into amino acids (Le. 9 also referred to as 5* and 3' non-translated regions). 

Given the coding strand sequences encoding the SECP proteins disclosed herein (e.g., 
15 SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56), antisense 
nucleic acids of the invention can be designed according to the rules of Watson and Crick or 
Hoogsteen base-pairing. The antisense nucleic acid molecule can be complementary to the entire 
coding region of SECP mRNA, but more preferably is an oligonucleotide that is antisense to 
only a portion of the coding or non-coding region of SECP mRNA. For example, the antisense 
20 oligonucleotide can be complementary to the region surrounding the translation start site of 

SECP mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 
40, 45 or 50 nucleotides in length. An antisense nucleic acid of the invention can be constructed 
using chemical synthesis or enzymatic ligation reactions using procedures known in the art. For 
example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be chemically 
25 synthesized using naturally-occurring nucleotides or variously modified nucleotides designed to 
increase the biological stability of the molecules or to increase the physical stability of the 
duplex formed between the antisense and sense nucleic acids, e.g., phosphorothioate derivatives 
and acridine-substituted nucleotides can be used. 

Examples of modified nucleotides that can be used to generate the antisense nucleic acid 
30 include: 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xanthine, 
4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl- 
2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, 
inosine, N6-isopentenyIadenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 
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2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 
7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyI-2-thiouracil, 
beta-D-mannosylqueosine, 5 -methoxycarboxymethyluracil, 5-methoxyuracil, 

2- methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
5 queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, 

uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 

3- (3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 
nucleic acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the 

10 inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 

The antisense nucleic acid molecules of the invention are typically administered to a 
subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding a SECP protein to thereby inhibit expression of the protein, e.g., by 

15 inhibiting transcription and/or translation. The hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, in the case of an antisense nucleic acid 
molecule that binds to DNA duplexes, through specific interactions in the major groove of the 
double helix. An example of a route of administration of antisense nucleic acid molecules of the 
invention includes direct injection at a tissue site. Alternatively, antisense nucleic acid 

20 molecules can be modified to target selected cells and then administered systemically. For 
example, for systemic administration, antisense molecules can be modified such that they 
specifically bind to receptors or antigens expressed on a selected cell surface (e.g., by linking the 
antisense nucleic acid molecules to peptides or antibodies that bind to cell surface receptors or 
antigens). The antisense nucleic acid molecules can also be delivered to cells using the vectors 

25 described herein. To achieve sufficient intracellular concentrations of antisense molecules, 
vector constructs in which the antisense nucleic acid molecule is placed under the control of a 
strong pol II or pol III promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
ct-anomeric nucleic acid molecule. An cc-anomeric nucleic acid molecule forms specific 
30 double-stranded hybrids with complementary RNA in which, contrary to the usual cc-units, the 
strands run parallel to each other (see, Gaultier, et al. y 1987. Nucl. Acids Res. 15: 6625-6641). 
The antisense nucleic acid molecule can also comprise a 2-o-methylribonucleotide (Inoue, et al. y 
1987. Nucl. Acids Res. 15: 6131-6148) or a chimeric RNA-DNA analogue (Inoue, et aL, 1987. 
FEBS Lett. 215: 327-330). 



Ribozymes and PNA Moieties 

Such modifications include, by way of non-limiting example, modified bases, and nucleic 
acids whose sugar phosphate backbones are modified or derivatized. These modifications are 
carried out at least in part to enhance the chemical stability of the modified nucleic acid, such 
5 that they may be used, for example, as antisense binding nucleic acids in therapeutic applications 
in a subject. 

In still another embodiment, an antisense nucleic acid of the invention is a ribozyme. 
Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of cleaving a 
single-stranded nucleic acid, such as an mRNA, to which they have a complementary region. 

10 Thus, ribozymes (e.g., hammerhead ribozymes; described by Haselhoff and Gerlach, 1988. 
Nature 334: 585-591) can be used to catalytically-cleave SECP mRNA transcripts to thereby , 
inhibit translation of SECP mRNA. A ribozyme having specificity for a SECP-encoding nucleic 
acid can be designed based upon the nucleotide sequence of a SECP DNA disclosed herein (i.e., 
SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56). For example, a 

15 derivative of a Tetrahymena L-19 IVS RNA can be constructed in which the nucleotide sequence 
of the active site is complementary to the nucleotide sequence to be cleaved in a SECP-encoding 
mRNA. See, e.g., Cech, et al, U.S. Patent No. 4,987,071; and Cech, et al, U.S. Patent No. 
5,1 16,742. Alternatively, SECP mRNA can be used to select a catalytic RNA having a specific 
ribonuclease activity from a pool of RNA molecules (Bartel, et al., 1993. Science 261: 

20 1411-1418). 

Alternatively, SECP gene expression can be inhibited by targeting nucleotide sequences 
complementary to the regulatory region of the SECP (e.g., the SECP promoter and/or enhancers) 
to form triple helical structures that prevent transcription of the SECP gene in target cells. See, 
e.g., Helene, 1991. Anticancer Drug Des. 6: 569-84; Helene, et al, 1992. Ann. NY. Acad. Sci. 
25 660: 27-36; and Maher, 1992. Bioassays 14: 807-15. 

In various embodiments, the nucleic acids of SECP can be modified at the base moiety, 

sugar moiety or phosphate backbone to improve, e.g., the stability, hybridization, or solubility of 

the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can be 

modified to generate peptide nucleic acids (Hyrup, et al, 1996. Bioorg. Med. Chem. 4: 5-23). As 

30 used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g., DNA 

mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone 

and only the four natural nucleobases are retained. The neutral backbone of PNAs has been 

shown to allow for specific hybridization to DNA and RNA under conditions of low ionic 

strength. The synthesis of PNA oligomers can be performed using standard solid phase peptide 
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synthesis protocols as described in Hyrup, et al, 1996. supra; Perry-O'Keefe, et al, 1996. Proc. 
Natl Acad. Sci. USA 93: 14670-14675. 

PNAs of SECP can be used in therapeutic and diagnostic applications. For example, 
PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 
5 expression by, e.g., inducing transcription or translation arrest or inhibiting replication. PNAs of 
SECP can also be used, e.g., in the analysis of single base pair mutations in a gene by, e.g., PNA 
directed PCR clamping; as artificial restriction enzymes when used in combination with other 
enzymes, e.g., SI nucleases (see, Hyrup, 1996., supra); or as probes or primers for DNA 
sequence and hybridization (see, Hyrup, et ah, 1996.; Perry-O'Keefe, 1996., supra). 

10 In another embodiment, PNAs of SECP can be modified, e.g., to enhance their stability 

or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the formation of 
PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery known in 
the art. For example, PNA-DNA chimeras of SECP can be generated that may combine the 
advantageous properties of PNA and DNA. Such chimeras allow DNA recognition enzymes, 

15 e.g., RNase H and DNA polymerases, to interact with the DNA portion while the PNA portion 
would provide high binding affinity and specificity. PNA-DNA chimeras can be linked using 
linkers of appropriate lengths selected in terms of base stacking, number of bonds between the 
nucleobases, and orientation (see, Hyrup, 1996., supra). The synthesis of PNA-DNA chimeras 
can be performed as described in Finn, et al, (1996. Nuci Acids Res. 24: 3357-3363). For 

20 example, a DNA chain can be synthesized on a solid support using standard phosphoramidite 
coupling chemistry, and modified nucleoside analogs, e.g., 5'-(4-methoxytrityl)amino- 
S'-deoxy-thymidine phosphoramidite, can be used between the PNA and the 5' end of DNA 
(Mag, et al., 1989. Nucl. Acid Res. 17: 5973-5988). PNA monomers are then coupled in a 
stepwise manner to produce a chimeric molecule with a 5' PNA segment and a 3' DNA segment 

25 (see, Finn, et al., 1996., supra). Alternatively, chimeric molecules can be synthesized with a 5' 
DNA segment and a 3' PNA segment. See, e.g., Petersen, et al., 1975. Bioorg. Med. Chem. Lett. 
5:1119-11124. 

In other embodiments, the oligonucleotide may include other appended groups such as 

peptides (e.g., for targeting host cell receptors in vivo), or agents facilitating transport across the 

30 cell membrane (see, e.g., Letsinger, et al., 1989. Proc. Natl. Acad. Sci. U.S.A. 86: 6553-6556; 

Lemaitre, et al., 1987. Proc. Natl. Acad. Sci. 84: 648-652; PCT Publication No. WO88/09810) or 

the blood-brain barrier (see, e.g., PCT Publication No. WO 89/10134). In addition, 

oligonucleotides can be modified with hybridization triggered cleavage agents (see, e.g., Krol, et 

al., 1988. BioTechniques 6:958-976) or intercalating agents (see, e.g., Zon, 1988. Pharm. Res. 5: 
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539-549). To this end, the oligonucleotide may be conjugated to another molecule, e.g., a 
peptide, a hybridization triggered cross-linking agent, a transport agent, a hybridization-triggered 
cleavage agent, and the like. 



Characterization of SECP Polypeptides 

5 A polypeptide according to the invention includes a polypeptide including the amino acid 

sequence of SECP polypeptides whose sequences are provided in SEQ ID NO:2, 4, 6, 8, 10, 12, 
14, 16, 18, [YY, BB, DD, FF, HH, JJ, LL, NN1 4L 43. 45, 47. 49, 51, 53. 55. and/or [PP.J57. 
The invention also includes a mutant or variant protein any of whose residues may be changed 
from the corresponding residues shown in SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, [YY, BB, 
10 DD, FF, HH, JJ, LL, NN1 41, 43. 45. 47. 49, 51. 53. 55. and/or [PP]57 while still encoding a 
protein that maintains its SECP activities and physiological functions, or a functional fragment 
thereof. 

In general, a SECP variant that preserves SECP-like function includes any variant in 
which residues at a particular position in the sequence have been substituted by other amino 
15 acids, and further include the possibility of inserting an additional residue or residues between 
two residues of the parent protein as well as the possibility of deleting one or more residues from 
the parent sequence. Any amino acid substitution, insertion, or deletion is encompassed by the 
invention. In favorable circumstances, the substitution is a conservative substitution as defined 
above. 

20 One aspect of the invention pertains to isolated SECP proteins, and biologically-active 

portions thereof, or derivatives, fragments, analogs or homologs thereof. Also provided are 
polypeptide fragments suitable for use as immunogens to raise anti-SECP antibodies. In one 
embodiment, native SECP proteins can be isolated from cells or tissue sources by an appropriate 
purification scheme using standard protein purification techniques. In another embodiment, 

25 SECP proteins are produced by recombinant DNA techniques. Alternative to recombinant 

expression, a SECP protein or polypeptide can be synthesized chemically using standard peptide 
synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion thereof is 
substantially free of cellular material or other contaminating proteins from the cell or tissue 
30 source from which the SECP protein is derived, or substantially free from chemical precursors or 
other chemicals when chemically synthesized. The language "substantially free of cellular 
material" includes preparations of SECP proteins in which the protein is separated from cellular 
components of the cells from which it is isolated or recombinantly-produced. In one 
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embodiment, the language "substantially free of cellular material" includes preparations of SECP 
proteins having less than about 30% (by dry weight) of non-SECP proteins (also referred to 
herein as a "contaminating protein"), more preferably less than about 20% of non-SECP proteins, 
still more preferably less than about 10% of non-SECP proteins, and most preferably less than 
5 about 5% of non-SECP proteins. When the SECP protein or biologically-active portion thereof 
is recombinandy-produced, it is also preferably substantially free of culture medium, i.e. t culture 
medium represents less than about 20%, more preferably less than about 10%, and most 
preferably less than about 5% of the volume of the SECP protein preparation. 

The phrase "substantially free of chemical precursors or other chemicals" includes 
10 preparations of SECP protein in which the protein is separated from chemical precursors or other 
chemicals that are involved in the synthesis of the protein. In one embodiment, the language 
"substantially free of chemical precursors or other chemicals" includes preparations of SECP 
protein having less than about 30% (by dry weight) of chemical precursors or non-SECP 
chemicals, more preferably less than about 20% chemical precursors or non-SECP chemicals, 
15 still more preferably less than about 10% chemical precursors or non-SECP chemicals, and most 
preferably less than about 5% chemical precursors or non-SECP chemicals. 

Biologically-active portions of a SECP protein include peptides comprising amino acid 
sequences sufficiently homologous to or derived from the amino acid sequence of the SECP 
protein which include fewer amino acids than the full-length SECP proteins, and exhibit at least 
20 one activity of a SECP protein. Typically, biologically-active portions comprise a domain or 
motif with at least one activity of the SECP protein. A biologically-active portion of a SECP 
protein can be a polypeptide which is, for example, 10, 25, 50, 100 or more amino acids in 
length. 

A biologically-active portion of a SECP protein of the invention may contain at least one 
25 of the above-identified conserved domains. Moreover, other biologically active portions, in 
which other regions of the protein are deleted, can be prepared by recombinant techniques and 
evaluated for one or more of the functional activities of a native SECP protein. 

In an embodiment, the SECP protein has an amino acid sequence shown in any of SEQ 
ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. In other 
30 embodiments, the SECP protein is substantially homologous to any of SEQ ID NO: 1, 3, 5, 7, 9, 
11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional activity of the 
protein of any of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17,40, 42,44, 46,48, 50, 52, 54 and/or 56 
yet differs in amino acid sequence due to natural allelic variation or mutagenesis, as described in 
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detail below. Accordingly, in another embodiment, the SECP protein is a protein that comprises 
an amino acid sequence at least about 45% homologous, and more preferably about 55, 65, 70, 
75, 80, 85, 90, 95, 98 or even 99% homologous to the amino acid sequence of any of SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional 
5 activity of the SECP proteins of the corresponding polypeptide having the sequence of SEQ ID 
NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. 

Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic acids, 
the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the 

10 sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second 

amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino 
acid positions or nucleotide positions are then compared. When a position in the first sequence 
is occupied by the same amino acid residue or nucleotide as the corresponding position in the 
second sequence, then the molecules are homologous at that position (Le. 9 as used herein amino 

15 acid or nucleic acid "homology" is equivalent to amino acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity 
between two sequences. The homology may be determined using computer programs known in 
the art, such as GAP software provided in the GCG program package. See, Needleman and 
Wunsch, 1970. / Mol. Biol 48: 443-453. Using GCG GAP software with the following settings 
20 for nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP extension penalty 
of 0.3, the coding region of the analogous nucleic acid sequences referred to above exhibits a 
degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%, with the 
CDS (encoding) part of the DNA sequence shown in SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 
42, 44, 46, 48, 50, 52, 54 and/or 56. 

25 The term "sequence identity" refers to the degree to which two polynucleotide or 

polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity" is calculated by comparing two 
optimally aligned sequences over that region of comparison, determining the number of positions 
at which the identical nucleic acid base (e.g., A, T, C, G, U, or I, in the case of nucleic acids) 

30 occurs in both sequences to yield the number of matched positions, dividing the number of 

matched positions by the total number of positions in the region of comparison (i.e., the window 
size), and multiplying the result by 100 to yield the percentage of sequence identity. The term 
"substantial identity" as used herein denotes a characteristic of a polynucleotide sequence, 

149 



wherein the polynucleotide comprises a sequence that has at least 80 percent sequence identity, 
preferably at least 85 percent identity and often 90 to 95 percent sequence identity, more usually 
at least 99 percent sequence identity as compared to a reference sequence over a comparison 
region. 

5 Chimeric and Fusion Proteins 

The invention also provides SECP chimeric or fusion proteins. As used herein, a SECP 
"chimeric protein" or "fusion protein" comprises a SECP polypeptide operatively-linked to a 
non-SECP polypeptide. An "SECP polypeptide" refers to a polypeptide having an amino acid 
sequence corresponding to a SECP protein shown in SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 

10 [YY, BB, DD, FF, HH, JJ, LL, NN 141. 43. 45, 47, 49, 51, 53, 55. and/or [PPJ57, whereas a 

"non-SECP polypeptide" refers to a polypeptide having an amino acid sequence corresponding to 
a protein that is not substantially homologous to the SECP protein {e.g., a protein that is different 
from the SECP protein and that is derived from the same or a different organism). Within a 
SECP fusion protein the SECP polypeptide can correspond to all or a portion of a SECP protein. 

15 In one embodiment, a SECP fusion protein comprises at least one biologically-active portion of a 
SECP protein. In another embodiment, a SECP fusion protein comprises at least two 
biologically-active portions of a SECP protein. In yet another embodiment, a SECP fusion 
protein comprises at least three biologically-active portions of a SECP protein. Within the fusion 
protein, the term "operatively-linked" is intended to indicate that the SECP polypeptide and the 

20 non-SECP polypeptide are fused in-frame with one another. The non-SECP polypeptide can be 
fused to the amino-terminus or carboxyl-terminus of the SECP polypeptide. 

In one embodiment, the fusion protein is a GST-SECP fusion protein in which the SECP 
sequences are fused to the carboxyl-terminus of the GST (glutathione S-transferase) sequences. 
Such fusion proteins can facilitate the purification of recombinant SECP polypeptides. 

25 In another embodiment, the fusion protein is a SECP protein containing a heterologous 

signal sequence at its amino-terminus. In certain host cells (e.g. y mammalian host cells), 
expression and/or secretion of SECP can be increased through use of a heterologous signal 
sequence. 

In yet another embodiment, the fusion protein is a SECP-immunoglobulin fusion protein 
30 in which the SECP sequences are fused to sequences derived from a member of the 

immunoglobulin protein family. The SECP-immunoglobulin fusion proteins of the invention can 
be incorporated into pharmaceutical compositions and administered to a subject to inhibit an 
interaction between a SECP ligand and a SECP protein on the surface of a cell, to thereby 
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suppress SECP-mediated signal transduction in vivo. The SECP-immunoglobulin fusion 
proteins can be used to affect the bioavailability of a SECP cognate ligand. Inhibition of the 
SECP ligand/SECP interaction may be useful therapeutically for both the treatment of 
proliferative and differentiative disorders, as well as modulating (e.g., promoting or inhibiting) 
5 cell survival. Moreover, the SECP-immunoglobulin fusion proteins of the invention can be used 
as immunogens to produce anti-SECP antibodies in a subject, to purify SECP ligands, and in 
screening assays to identify molecules that inhibit the interaction of SECP with a SECP ligand. 

A SECP chimeric or fusion protein of the invention can be produced by standard 
recombinant DNA techniques. For example, DNA fragments coding for the different 

10 polypeptide sequences are ligated together in-frame in accordance with conventional techniques, 
e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme 
digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline 
phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another 
embodiment, the fusion gene can be synthesized by conventional techniques including automated 

15 DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using 
anchor primers that give rise to complementary overhangs between two consecutive gene 
fragments that can subsequently be annealed and reamplified to generate a chimeric gene 
sequence (see, e.g., Ausubel, et al. (eds.) Current Protocols in Molecular Biology, John 
Wiley & Sons, 1992). Moreover, many expression vectors are commercially available that 

20 already encode a fusion moiety (e.g., a GST polypeptide). A SECP-encoding nucleic acid can be 
cloned into such an expression vector such that the fusion moiety is linked in-frame to the SECP 
protein. 

SECP Agonists and Antagonists 

The invention also pertains to variants of the SECP proteins that function as either SECP 
25 agonists (i.e., mimetics) or as SECP antagonists. Variants of the SECP protein can be generated 
by mutagenesis (e.g., discrete point mutation or truncation of the SECP protein). An agonist of a 
SECP protein can retain substantially the same, or a subset of, the biological activities of the 
naturally-occurring form of a SECP protein. An antagonist of a SECP protein can inhibit one or 
more of the activities of the naturally occurring form of a SECP protein by, for example, 
30 competitively binding to a downstream or upstream member of a cellular signaling cascade 

which includes the SECP protein. Thus, specific biological effects can be elicited by treatment 
with a variant of limited function. In one embodiment, treatment of a subject with a variant 
having a subset of the biological activities of the naturally occurring form of the protein has 
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fewer side effects in a subject relative to treatment with the naturally occurring form of the SECP 
proteins. 

Variants of the SECP proteins that function as either SECP agonists (i.e., mimetics) or as 
SECP antagonists can be identified by screening combinatorial libraries of mutants (e.g., 
5 truncation mutants) of the SECP proteins for SECP protein agonist or antagonist activity. In one 
embodiment, a variegated library of SECP variants is generated by combinatorial mutagenesis at 
the nucleic acid level and is encoded by a variegated gene library. A variegated library of SECP 
variants can be produced by, for example, enzymatically-ligating a mixture of synthetic 
oligonucleotides into gene sequences such that a degenerate set of potential SECP sequences is 

10 expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins (e.g., for 
phage display) containing the set of SECP sequences therein. There are a variety of methods 
which can be used to produce libraries of potential SECP variants from a degenerate 
oligonucleotide sequence. Chemical synthesis of a degenerate gene sequence can be performed 
in an automatic DNA synthesizer, and the synthetic gene then ligated into an appropriate 

15 expression vector. Use of a degenerate set of genes allows for the provision, in one mixture, of 
all of the sequences encoding the desired set of potential SECP sequences. Methods for 
synthesizing degenerate oligonucleotides are well-known within the art. See, e.g., Narang, 1983. 
Tetrahedron 39: 3; Itakura, et aL, 1984. Annu. Rev. Biochem. 53: 323; Itakura, et al, 1984. 
Science 198: 1056; Ike, et aL, 1983. Nucl. Acids Res. 1 1 : 477. 

20 Polypeptide Libraries 

In addition, libraries of fragments of the SECP protein coding sequences can be used to 
generate a variegated population of SECP fragments for screening and subsequent selection of 
variants of a SECP protein. In one embodiment, a library of coding sequence fragments can be 
generated by treating a double-stranded PCR fragment of a SECP coding sequence with a 

25 nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the 
double stranded DNA, renaturing the DNA to form double-stranded DNA that can include 
sense/antisense pairs from different nicked products, removing single stranded portions from 
reformed duplexes by treatment with Si nuclease, and ligating the resulting fragment library into 
an expression vector. By this method, expression libraries can be derived which encodes 

30 amino-terminal and internal fragments of various sizes of the SECP proteins. 

Various techniques are known in the art for screening gene products of combinatorial 
libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 
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gene libraries generated by the combinatorial mutagenesis of SECP proteins. The most widely 
used techniques, which are amenable to high throughput analysis, for screening large gene 
libraries typically include cloning the gene library into replicable expression vectors, 
transforming appropriate cells with the resulting library of vectors, and expressing the 
5 combinatorial genes under conditions in which detection of a desired activity facilitates isolation 
of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis 
(REM), a new technique that enhances the frequency of functional mutants in the libraries, can 
be used in combination with the screening assays to identify SECP variants. See, e.g., Arkin and 
Yourvan, 1992. Proc. Natl. Acad. Sci. USA 89: 781 1-7815; Delgrave, et al. y 1993. Protein 
10 Engineering 6:327-33 1 . 

Anti-SECP Antibodies 

The invention encompasses antibodies and antibody fragments, such as F a b or (Fabh, that 
bind immunospecifically to any of the SECP polypeptides of said invention. 

An isolated SECP protein, or a portion or fragment thereof, can be used as an immunogen 
15 to generate antibodies that bind to SECP polypeptides using standard techniques for polyclonal 
and monoclonal antibody preparation. The full-length SECP proteins can be used or, 
alternatively, the invention provides antigenic peptide fragments of SECP proteins for use as 
immunogens. The antigenic SECP peptides comprises at least 4 amino acid residues of the 
amino acid sequence shown in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, [YY, BB, DD, FF, HH, 
20 JJ, LL, NN1 4L 43, 45. 47. 49, 51. 53, 55, and/or [PPJ57, and encompasses an epitope of SECP 
such that an antibody raised against the peptide forms a specific immune complex with SECP. 
Preferably, the antigenic peptide comprises at least 6, 8, 10, 15, 20, or 30 amino acid residues. 
Longer antigenic peptides are sometimes preferable over shorter antigenic peptides, depending 
on use and according to methods well known to someone skilled in the art. 

25 In certain embodiments of the invention, at least one epitope encompassed by the 

antigenic peptide is a region of SECP that is located on the surface of the protein (e.g., a 
hydrophilic region). As a means for targeting antibody production, hydropathy plots showing 
regions of hydrophilicity and hydrophobicity may be generated by any method well known in the 
art, including, for example, the Kyte-Doolittle or the Hopp- Woods methods, either with or 

30 without Fourier transformation (see, e.g., Hopp and Woods, 1981. Proc. Nat. Acad. Sci. USA 78: 
3824-3828; Kyte and Doolitde, 1982. J. Mol. Biol 157: 105-142, each incorporated herein by 
reference in their entirety). 
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As disclosed herein, SECP protein sequences of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 
18,[ YY, BB, DD, FF, HH, JJ, LL, NN1 41. 43. 45, 47. 49, 51. 53. 55. and/or [PPJ5L or 
derivatives, fragments, analogs, or homologs thereof, may be utilized as immunogens in the 
generation of antibodies that immunospecifically-bind these protein components. The term 
5 "antibody" as used herein refers to immunoglobulin molecules and immunologically-active 
portions of immunoglobulin molecules, i.e., molecules that contain an antigen binding site that 
specifically-binds (immunoreacts with) an antigen, such as SECP. Such antibodies include, but 
are not limited to, polyclonal, monoclonal, chimeric, single chain, Fab and fragments, and 
an F^ expression library. In a specific embodiment, antibodies to human SECP proteins are 
10 disclosed. Various procedures known within the art may be used for the production of 

polyclonal or monoclonal antibodies to a SECP protein sequence of SEQ ID NO: 2, 4, 6, 8, 10, 
12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and/or 57, or a derivative, fragment, analog, or 
homolog thereof. 

For the production of polyclonal antibodies, various suitable host animals (e.g., rabbit, 
15 goat, mouse or other mammal) may be immunized by injection with the native protein, or a 
synthetic variant thereof, or a derivative of the foregoing. An appropriate immunogenic 
preparation can contain, for example, recombinandy-expressed SECP protein or a chemically- 
synthesized SECP polypeptide. The preparation can further include an adjuvant. Various 
adjuvants used to increase the immunological response include, but are not limited to, Freund's 
20 (complete and incomplete), mineral gels (e.g., aluminum hydroxide), surface active substances 
(e.g., lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenol, etc.), 
human adjuvants such as Bacille Calmette-Guerin and Corynebacterium parvum, or similar 
immunostimulatory agents. If desired, the antibody molecules directed against SECP can be 
isolated from the mammal (e.g., from the blood) and further purified by well known techniques, 
25 such as protein A chromatography to obtain the IgG fraction. 

The term "monoclonal antibody" or "monoclonal antibody composition", as used herein, 
refers to a population of antibody molecules that contain only one species of an antigen binding 
site capable of immunoreacting with a particular epitope of SECP. A monoclonal antibody 
composition thus typically displays a single binding affinity for a particular SECP protein with 
30 which it immunoreacts. For preparation of monoclonal antibodies directed towards a particular 
SECP protein, or derivatives, fragments, analogs or homologs thereof, any technique that 
provides for the production of antibody molecules by continuous cell line culture may be 
utilized. Such techniques include, but are not limited to, the hybridoma technique (see, e.g., 
Kohler & Milstein, 1975. Nature 256: 495-497); the trioma technique; the human B-cell 



hybridoma technique (see, e.g., Kozbor, et al, 1983. Immunol Today 4: 72) and the EBV 
hybridoma technique to produce human monoclonal antibodies (see, e.g., Cole, et al, 1985. In: 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human 
monoclonal antibodies may be utilized in the practice of the invention and may be produced by 
5 using human hybridomas (see, e.g., Cote, et al, 1983. Proc Natl Acad Sci USA 80: 2026-2030) 
or by transforming human B-cells with Epstein Barr Virus in vitro (see, e.g., Cole, et al, 1985. 
In: Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Each of 
the above citations is incorporated herein by reference in their entirety. 

According to the invention, techniques can be adapted for the production of single-chain 
10 antibodies specific to a SECP protein (see, e.g., U.S. Patent No. 4,946,778). In addition, 

methods can be adapted for the construction of expression libraries (see, e.g., Huse, et al, 
1989. Science 246: 1275-1281) to allow rapid and effective identification of monoclonal Fab 
fragments with the desired specificity for a SECP protein or derivatives, fragments, analogs or 
homologs thereof. Non-human antibodies can be "humanized" by techniques well known in the 
15 art. See, e.g., U.S. Patent No. 5,225,539. Antibody fragments that contain the idiotypes to a 
SECP protein may be produced by techniques known in the art including, but not limited to: 
(i) an F(ab , )2 fragment produced by pepsin digestion of an antibody molecule; (it) an Fab fragment 
generated by reducing the disulfide bridges of an F (a b')2 fragment; (Hi) an F a b fragment generated 
by the treatment of the antibody molecule with papain and a reducing agent and (iv) F v 
20 fragments. 

Additionally, recombinant anti-SECP antibodies, such as chimeric and humanized 
monoclonal antibodies, comprising both human and non-human portions, which can be made 
using standard recombinant DNA techniques, are within the scope of the invention. Such 
chimeric and humanized monoclonal antibodies can be produced by recombinant DNA 

25 techniques known in the art, for example using methods described in International Application 
No. PCT/US86/02269; European Patent Application No. 184,187; European Patent Application 
No. 171,496; European Patent Application No. 173,494; PCT International Publication No. WO 
86/01533; U.S. Patent No. 4,816,567; U.S. Pat. No. 5,225,539; European Patent Application No. 
125,023; Better, et al, 1988. Science 240: 1041-1043; Liu, et al, 1987. Proc. Natl Acad. Sci. 

30 USA 84: 3439-3443; Liu, et al, 1987. J. Immunol 139: 3521-3526; Sun, et al, 1987. Proc. Natl. 
Acad. Sci. USA 84: 214-218; Nishimura, et al, 1987. Cancer Res. 47: 999-1005; Wood, etal, 

1985. Nature 314 :446-449; Shaw, et al, 1988. J. Natl Cancer Inst. 80: 1553-1559); 
Morrison(1985) Science 229:1202-1207; Oi, et al (1986) BioTechniques 4:214; Jones, etal, 

1986. Nature 321: 552-525; Verhoeyan, et al, 1988. Science 239: 1534; and Beidler, et al, 



1988. J. Immunol 141: 4053-4060. Each of the above citations are incorporated herein by 
reference in their entirety. 

In one embodiment, methods for the screening of antibodies that possess the desired 
specificity include, but are not limited to, enzyme-linked immunosorbent assay (ELISA) and 
5 other immunologically-mediated techniques known within the art. In a specific embodiment, 
selection of antibodies that are specific to a particular domain of a SECP protein is facilitated by 
generation of hybridomas that bind to the fragment of a SECP protein possessing such a domain. 
Thus, antibodies that are specific for a desired domain within a SECP protein, or derivatives, 
fragments, analogs or homologs thereof, are also provided herein. 

10 Anti-SECP antibodies may be used in methods known within the art relating to the 

localization and/or quantitation of a SECP protein {e.g., for use in measuring levels of the SECP 
protein within appropriate physiological samples, for use in diagnostic methods, for use in 
imaging the protein, and the like). In a given embodiment, antibodies for SECP proteins, or 
derivatives, fragments, analogs or homologs thereof, that contain the antibody derived binding 

15 domain, are utilized as pharmacologically-active compounds (hereinafter "Therapeutics"). 

An anti-SECP antibody (e.g., monoclonal antibody) can be used to isolate a SECP 
polypeptide by standard techniques, such as affinity chromatography or immunoprecipitation. 
An anti-SECP antibody can facilitate the purification of natural SECP polypeptide from cells and 
of recombinantly-produced SECP polypeptide expressed in host cells. Moreover, an anti-SECP 

20 antibody can be used to detect SECP protein {e.g., in a cellular lysate or cell supernatant) in order 
to evaluate the abundance and pattern of expression of the SECP protein. Anti-SECP antibodies 
can be used diagnostically to monitor protein levels in tissue as part of a clinical testing 
procedure, e.g., to, for example, determine the efficacy of a given treatment regimen. Detection 
can be facilitated by coupling (i.e., physically linking) the antibody to a detectable substance. 

25 Examples of detectable substances include various enzymes, prosthetic groups, fluorescent 

materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, P-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin 
and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, 

30 fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or 
phycoerythrin; an example of a luminescent material includes luminol; examples of 
bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable 
radioactive material include 125 1, 131 1, 35 S or 3 H. 



SECP Recombinant Expression Vectors and H st Cells 

Another aspect of the invention pertains to vectors, preferably expression vectors, 
containing a nucleic acid encoding a SECP protein, or derivatives, fragments, analogs or 
homologs thereof. As used herein, the term "vector" refers to a nucleic acid molecule capable of 
5 transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid", 
which refers to a circular double stranded DNA loop into which additional DNA segments can 
be ligated. Another type of vector is a viral vector, wherein additional DNA segments can be 
ligated into the viral genome. Certain vectors are capable of autonomous replication in a host 
cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication 

10 and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are 
integrated into the genome of a host cell upon introduction into the host cell, and thereby are 
replicated along with the host genome. Moreover, certain vectors are capable of directing the 
expression of genes to which they are operatively-linked. Such vectors are referred to herein as 
"expression vectors". In general, expression vectors of utility in recombinant DNA techniques 

15 are often in the form of plasmids. In the present Specification, "plasmid" and "vector" can be 
used interchangeably, as the plasmid is the most commonly used form of vector. However, the 
invention is intended to include such other forms of expression vectors, such as viral vectors 
(e.g., replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve 
equivalent functions. 

20 The recombinant expression vectors of the invention comprise a nucleic acid of the 

invention in a form suitable for expression of the nucleic acid in a host cell, which means that the 
recombinant expression vectors include one or more regulatory sequences, selected on the basis 
of the host cells to be used for expression, that is operatively-linked to the nucleic acid sequence 
to be expressed. Within a recombinant expression vector, "operably-linked" is intended to mean 

25 that the nucleotide sequence of interest is linked to the regulatory sequence(s) in a manner that 
allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation 
system or in a host cell when the vector is introduced into the host cell). 

The phrase "regulatory sequence" is intended to includes promoters, enhancers and other 

expression control elements (e.g., polyadenylation signals). Such regulatory sequences are 

30 described, for example, in Goeddel, Gene Expression Technology: Methods in 

Enzymology 185, Academic Press, San Diego, Calif. (1990). Regulatory sequences include 

those that direct constitutive expression of a nucleotide sequence in many types of host cell and 

those that direct expression of the nucleotide sequence only in certain host cells (e.g., 

tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the 
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design of the expression vector can depend on such factors as the choice of the host cell to be 
transformed, the level of expression of protein desired, etc. The expression vectors of the 
invention can be introduced into host cells to thereby produce proteins or peptides, including 
fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., SECP proteins, 
5 mutant forms of SECP proteins, fusion proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression of 
SECP proteins in prokaryotic or eukaryotic cells. For example, SECP proteins can be expressed 
in bacterial cells such as Escherichia coli y insect cells (using baculovirus expression vectors) 
yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene 
10 Expression Technology: Methods in Enzymology 1 85, Academic Press, San Diego, Calif. 
(1990). Alternatively, the recombinant expression vector can be transcribed and translated in 
vitrOj for example using T7 promoter regulatory sequences and T 7 polymerase. 

Expression of proteins in prokaryotes is most often carried out in Escherichia coli with 
vectors containing constitutive or inducible promoters directing the expression of either fusion or 

15 non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, 
usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve 
three purposes: (i) to increase expression of recombinant protein; (ii) to increase the solubility of 
the recombinant protein; and (Hi) to aid in the purification of the recombinant protein by acting 
as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage 

20 site is introduced at the junction of the fusion moiety and the recombinant protein to enable 
separation of the recombinant protein from the fusion moiety subsequent to purification of the 
fusion protein. Such enzymes, and their cognate recognition sequences, include Factor X a , 
thrombin, and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, 

25 Mass.) and pRTT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), 
maltose E binding protein, or protein A, respectively, to the target recombinant protein. 

Examples of suitable inducible non-fusion Escherichia coli expression vectors include 
pTrc (Amrann et a/., (1988) Gene 69:301-315) and pET 1 Id (Studier, et aU Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990) 
30 60-89). 

One strategy to maximize recombinant protein expression in Escherichia coli is to 
express the protein in a host bacteria with an impaired capacity to proteolytically-cleave the 
recombinant protein. See, e.g., Gottesman, Gene Expression Technology: Methods in 
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Enzymology 185, Academic Press, San Diego, Calif. (1990) 119-128. Another strategy is to 
alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that 
the individual codons for each amino acid are those preferentially utilized in Escherichia coli 
(see, e.g., Wada, et al, 1992. Nucl. Acids Res. 20: 21 1 1-2118). Such alteration of nucleic acid 
5 sequences of the invention can be carried out by standard DNA synthesis techniques. 

In another embodiment, the SECP expression vector is a yeast expression vector. 
Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl 
(Baldari, et al, 1987. EMBO J. 6: 229-234), pMFa (Kurjan and Herskowitz, 1982. Cell 30: 
933-943), pJRY88 (Schultz et al, 1987. Gene 54: 1 13-123), pYES2 (Invitrogen Corporation, 
10 San Diego, Calif.), and picZ (InVitrogen, Corp.; San Diego, Calif.). 

Alternatively, SECP can be expressed in insect cells using baculovirus expression 
vectors. Baculovirus vectors available for expression of proteins in cultured insect cells (e.g., 
SF9 cells) include the pAc series (Smith, et al. y 1983. Mol. Cell. Biol 3: 2156-2165) and the pVL 
series (Lucklow and Summers, 1989. Virology 170: 31-39). 

15 In yet another embodiment, a nucleic acid of the invention is expressed in mammalian 

cells using a mammalian expression vector. Examples of mammalian expression vectors include 
pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et al, 1987. EMBO J. 6: 
187-195). When used in mammalian cells, the expression vector's control functions are often 
provided by viral regulatory elements. For example, commonly used promoters are derived from 

20 polyoma, adenovirus 2, cytomegalovirus, and simian virus 40 (SV 40). For other suitable 
expression systems for both prokaryotic and eukaryotic cells see, e.g., Chapters 16 and 17 of 
Sambrook, et al, MOLECULAR Cloning: A Laboratory Manual. 2nd ed., Cold Spring Harbor 
Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989. 

In another embodiment, the recombinant mammalian expression vector is capable of 
25 directing expression of the nucleic acid preferentially in a particular cell type (e.g., 

tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific 
regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific 
promoters include the albumin promoter (liver-specific; see, Pinkert, et al, 1987. Genes Dev. 1: 
268-277), lymphoid-specific promoters (see, Calame and Eaton, 1988. Adv. Immunol. 43: 
30 235-275), in particular promoters of T cell receptors (see, Winoto and Baltimore, 1989. EMBO J. 
8: 729-733) and immunoglobulins (see, Banerji, et al., 1983. Cell 33: 729-740; Queen and 
Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., the neurofilament promoter; 
see, Byrne and Ruddle, 1989. Proc. Natl. Acad. Sci. USA 86: 5473-5477), pancreas-specific 



promoters {see, Edlund, et al., 1985. Science 230: 912-916), and mammary gland-specific 
promoters (e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and European Application 
Publication No. 264,166). Developmentally-regulated promoters are also encompassed, e.g., the 
murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the a-fetoprotein 
5 promoter (see, Campes and Tilghman, 1989. Genes Dev. 3: 537-546). 

The invention further provides a recombinant expression vector comprising a DNA 
molecule of the invention cloned into the expression vector in an antisense orientation. That is, 
the DNA molecule is operatively-linked to a regulatory sequence in a manner that allows for 
expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to 

10 SECP mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense 
orientation can be chosen that direct the continuous expression of the antisense RNA molecule in 
a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can 
be chosen that direct constitutive, tissue specific or cell type specific expression of antisense 
RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid 

15 or attenuated virus in which antisense nucleic acids are produced under the control of a high 

efficiency regulatory region, the activity of which can be determined by the cell type into which 
the vector is introduced. For a discussion of the regulation of gene expression using antisense 
genes see, e.g., Weintraub, et al, "Antisense RNA as a molecular tool for genetic analysis," 
Reviews-Trends in Genetics, Vol. 1(1) 1986. 

20 Another aspect of the invention pertains to host cells into which a recombinant 

expression vector of the invention has been introduced. The terms "host cell" and "recombinant 
host cell" are used interchangeably herein. It is understood that such terms refer not only to the 
particular subject cell but also to the progeny or potential progeny of such a cell. Because certain 
modifications may occur in succeeding generations due to either mutation or environmental 

25 influences, such progeny may not, in fact, be identical to the parent cell, but are still included 
within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, SECP protein can be 
expressed in bacterial cells such as Escherichia coli, insect cells, yeast or mammalian cells (such 
as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to 
30 those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfection techniques. As used herein, the terms "transformation" and 
"transfection" are intended to refer to a variety of art-recognized techniques for introducing 
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foreign nucleic acid (e.g., DNA) into a host cell, including calcium phosphate or calcium 
chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation. 
Suitable methods for transforming or transfecting host cells can be found in Sambrook, et al. 
(Molecular Cloning: A Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory, 
5 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989), and other laboratory 
manuals. 

For stable transfection of mammalian cells, it is known that, depending upon the 
expression vector and transfection technique used, only a small fraction of cells may integrate 
the foreign DNA into their genome. In order to identify and select these integrants, a gene that 

10 encodes a selectable marker (e.g., resistance to antibiotics) is generally introduced into the host 
cells along with the gene of interest. Various selectable markers include those that confer 
resistance to drugs, such as G418, hygromycin and methotrexate. Nucleic acid encoding a 
selectable marker can be introduced into a host cell on the same vector as that encoding SECP or 
can be introduced on a separate vector. Cells stably-transfected with the introduced nucleic acid 

15 can be identified by drug selection (e.g., cells that have incorporated the selectable marker gene 
will survive, while the other cells die). 

A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can 
be used to produce (i.e., express) SECP protein. Accordingly, the invention further provides 
methods for producing SECP protein using the host cells of the invention. In one embodiment, 
20 the method comprises culturing the host cell of invention (i.e., into which a recombinant 

expression vector encoding SECP protein has been introduced) in a suitable medium such that 
SECP protein is produced. In another embodiment, the method further comprises isolating 
SECP protein from the medium or the host cell. 

Transgenic Animals 

25 The host cells of the invention can also be used to produce non-human transgenic 

animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte or an 
embryonic stem cell into which SECP protein-coding sequences have been introduced. These 
host cells can then be used to create non-human transgenic animals in which exogenous SECP 
sequences have been introduced into their genome or homologous recombinant animals in which 

30 endogenous SECP sequences have been altered. Such animals are useful for studying the 
function and/or activity of SECP protein and for identifying and/or evaluating modulators of 
SECP protein activity. As used herein, a "transgenic animal" is a non-human animal, preferably 
a mammal, more preferably a rodent such as a rat or mouse, in which one or more of the cells of 
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the animal includes a transgene. Other examples of transgenic animals include non-human 
primates, sheep, dogs, cows, goats, chickens, amphibians, etc. 

A transgene is exogenous DNA that is integrated into the genome of a cell from which a 
transgenic animal develops and that remains in the genome of the mature animal, thereby 
5 directing the expression of an encoded gene product in one or more cell types or tissues of the 
transgenic animal. As used herein, a "homologous recombinant animal" is a non-human animal, 
preferably a mammal, more preferably a mouse, in which an endogenous SECP gene has been 
altered by homologous recombination between the endogenous gene and an exogenous DNA 
molecule introduced into a cell of the animal, e.g., an embryonic cell of the animal, prior to 
1 0 development of the animal . 

A transgenic animal of the invention can be created by introducing SECP-encoding 
nucleic acid into the male pronuclei of a fertilized oocyte {e.g., by micro-injection, retroviral 
infection) and allowing the oocyte to develop in a pseudopregnant female foster animal. The 
human SECP cDNA sequences of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 

15 52, 54 and/or 56 can be introduced as a transgene into the genome of a non-human animal. 

Alternatively, a non-human homologue of the human SECP gene, such as a mouse SECP gene, 
can be isolated based on hybridization to the human SECP cDNA (described further supra) and 
used as a transgene. Intronic sequences and polyadenylation signals can also be included in the 
transgene to increase the efficiency of expression of the transgene. A tissue-specific regulatory 

20 sequence(s) can be operably-linked to the SECP transgene to direct expression of SECP protein 
to particular cells. Methods for generating transgenic animals via embryo manipulation and 
micro-injection, particularly animals such as mice, have become conventional in the art and are 
described, for example, in U.S. Patent Nos. 4,736,866; 4,870,009; and 4,873,191; and Hogan, 
1986. In: Manipulating the Mouse Embryo, Cold Spring Harbor Laboratory Press, Cold 

25 Spring Harbor, N.Y. Similar methods are used for production of other transgenic animals. A 
transgenic founder animal can be identified based upon the presence of the SECP transgene in its 
genome and/or expression of SECP mRNA in tissues or cells of the animals. A transgenic 
founder animal can then be used to breed additional animals carrying the transgene. Moreover, 
transgenic animals carrying a transgene-encoding SECP protein can further be bred to other 

30 transgenic animals carrying other transgenes. 

To create a homologous recombinant animal, a vector is prepared which contains at least 
a portion of a SECP gene into which a deletion, addition or substitution has been introduced to 
thereby alter, e.g., functionally disrupt, the SECP gene. The SECP gene can be a human gene 
(e.g., the cDNA of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56), 



but more preferably, is a non-human homologue of a human SECP gene. For example, a mouse 
homologue of human SECP gene of SEQIDNO:!, 3, 5, 7, 9, 11, 13, 15, 17, 40,42,44, 46, 48, 
50, 52, 54 and 56 can be used to construct a homologous recombination vector suitable for 
altering an endogenous SECP gene in the mouse genome. In one embodiment, the vector is 
5 designed such that, upon homologous recombination, the endogenous SECP gene is functionally 
disrupted (Le„ no longer encodes a functional protein; also referred to as a "knock out" vector). 

Alternatively, the vector can be designed such that, upon homologous recombination, the 
endogenous SECP gene is mutated or otherwise altered but still encodes functional protein (e.g., 
the upstream regulatory region can be altered to thereby alter the expression of the endogenous 

10 SECP protein). In the homologous recombination vector, the altered portion of the SECP gene is 
flanked at its 5'- and 3-termini by additional nucleic acid of the SECP gene to allow for 
homologous recombination to occur between the exogenous SECP gene carried by the vector 
and an endogenous SECP gene in an embryonic stem cell. The additional flanking SECP nucleic 
acid is of sufficient length for successful homologous recombination with the endogenous gene. 

15 Typically, several kilobases (Kb) of flanking DNA (both at the 5'- and 3-termini) are included in 
the vector. See, e.g., Thomas, et al, 1987. Cell 51: 503 for a description of homologous 
recombination vectors. The vector is ten introduced into an embryonic stem cell line {e.g., by 
electroporation) and cells in which the introduced SECP gene has homologously-recombined 
with the endogenous SECP gene are selected. See, e.g., Li, et al., 1992. Cell 69: 915. 

20 The selected cells are then micro-injected into a blastocyst of an animal {e.g., a mouse) to 

form aggregation chimeras. See, e.g., Bradley, 1987. In: Teratocarcinomas and Embryonic 
Stem Cells: A Practical Approach, Robertson, ed. IRL, Oxford, pp. 1 13-152. A chimeric 
embryo can then be implanted into a suitable pseudopregnant female foster animal and the 
embryo brought to term. Progeny harboring the homologously-recombined DNA in their germ 

25 cells can be used to breed animals in which all cells of the animal contain the homologously- 
recombined DNA by germline transmission of the transgene. Methods for constructing 
homologous recombination vectors and homologous recombinant animals are described further 
in Bradley, 1991. Curr. Opin. Biotechnol. 2: 823-829; PCT International Publication Nos.: WO 
90/11354; WO 91/01140; WO 92/0968; and WO 93/04169. 

30 In another embodiment, transgenic non-human animals can be produced that contain 

selected systems that allow for regulated expression of the transgene. One example of such a 

system is the cre/loxP recombinase system of bacteriophage PI. For a description of the cre/loxP 

recombinase system, See, e.g., Lakso, et al, 1992. Proc. Natl. Acad. ScL USA 89: 6232-6236. 

Another example of a recombinase system is the FLP recombinase system of Saccharomyces 
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cerevisiae. See, O'Gorman, et al., 1991. Science 251:1351-1355. If a cre/loxP recombinase 
system is used to regulate expression of the transgene, animals containing transgenes encoding 
both the Cre recombinase and a selected protein are required. Such animals can be provided 
through the construction of "double" transgenic animals, e.g., by mating two transgenic animals, 
5 one containing a transgene encoding a selected protein and the other containing a transgene 
encoding a recombinase. 

Clones of the non-human transgenic animals described herein can also be produced 
according to the methods described in Wilmut, et ah, 1997. Nature 385: 810-813. In brief, a cell 
(e.g., a somatic cell) from the transgenic animal can be isolated and induced to exit the growth 

10 cycle and enter Go phase. The quiescent cell can then be fused, e.g., through the use of electrical 
pulses, to an enucleated oocyte from an animal of the same species from which the quiescent cell 
is isolated. The reconstructed oocyte is then cultured such that it develops to morula or 
blastocyte and then transferred to pseudopregnant female foster animal. The offspring borne of 
this female foster animal will be a clone of the animal from which the cell (e.g., the somatic cell) 

15 is isolated. 

Pharmaceutical Compositions 

The SECP nucleic acid molecules, SECP proteins, and anti-SECP antibodies (also 
referred to herein as "active compounds") of the invention, and derivatives, fragments, analogs 
and homologs thereof, can be incorporated into pharmaceutical compositions suitable for 

20 administration. Such compositions typically comprise the nucleic acid molecule, protein, or 
antibody and a pharmaceutically-acceptable carrier. As used herein, "pharmaceutically- 
acceptable carrier" is intended to include any and all solvents, dispersion media, coatings, 
antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 
compatible with pharmaceutical administration. Suitable carriers are described in the most 

25 recent edition of Remington's Pharmaceutical Sciences, a standard reference text in the field, 
which is incorporated herein by reference. Preferred examples of such carriers or diluents 
include, but are not limited to, water, saline, finger's solutions, dextrose solution, and 5% human 
serum albumin. Liposomes and other non-aqueous (i.e., lipophilic) vehicles such as fixed oils 
may also be used. The use of such media and agents for pharmaceutical^ active substances is 

30 well known in the art. Except insofar as any conventional media or agent is incompatible with 
the active compound, use thereof in the compositions is contemplated. Supplementary active 
compounds can also be incorporated into the compositions. 
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A pharmaceutical composition of the invention is formulated to be compatible with its 
intended route of administration. Examples of routes of administration include parenteral, e.g.* 
intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (i.e. t topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
5 intradermal, or subcutaneous application can include the following components: a sterile diluent 
such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene 
glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl 
parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as 
ethylenediaminetetraacetic acid (EDTA); buffers such as acetates, citrates or phosphates, and 
10 agents for the adjustment of tonicity such as sodium chloride or dextrose. The pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral 
preparation can be enclosed in ampoules, disposable syringes or multiple dose vials made of 
glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions 

15 (where water soluble) or dispersions and sterile powders for the extemporaneous preparation of 
sterile injectable solutions or dispersion. For intravenous administration, suitable carriers 
include physiological saline, bacteriostatic water, Cremophor EL™ (BASF, Parsippany, N. J.) or 
phosphate buffered saline (PBS). In all cases, the composition must be sterile and should be 
fluid to the extent that easy syringeability exists. It must be stable under the conditions of 

20 manufacture and storage and must be preserved against the contaminating action of 

microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium 
containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and 
liquid polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can 
be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the 

25 required particle size in the case of dispersion and by the use of surfactants. Prevention of the 
action of microorganisms can be achieved by various antibacterial and antifungal agents, for 
example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many 
cases, it will be preferable to include isotonic agents, for example, sugars, poly alcohols such as 
manitol, sorbitol, sodium chloride in the composition. Prolonged absorption of the injectable 

30 compositions can be brought about by including in the composition an agent which delays 
absorption, for example, aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active compound (e.g., a 
SECP protein or anti-SECP antibody) in the required amount in an appropriate solvent with one 
or a combination of ingredients enumerated above, as required, followed by filtered sterilization. 
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Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle 
that contains a basic dispersion medium and the required other ingredients from those 
enumerated above. In the case of sterile powders for the preparation of sterile injectable 
solutions, methods of preparation are vacuum drying and freeze-drying that yields a powder of 
5 the active ingredient plus any additional desired ingredient from a previously sterile-filtered 
solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They can be 
enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in the form of 

10 tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier for use 
as a mouthwash, wherein the compound in the fluid carrier is applied orally and swished and 
expectorated or swallowed. Pharmaceutically compatible binding agents, and/or adjuvant 
materials can be included as part of the composition. The tablets, pills, capsules, troches and the 
like can contain any of the following ingredients, or compounds of a similar nature: a binder 

15 such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or 

lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant such as 
magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent 
such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or 
orange flavoring. 

20 For administration by inhalation, the compounds are delivered in the form of an aerosol 

spray from pressured container or dispenser which contains a suitable propellant, e.g., a gas such 
as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated 
25 are used in the formulation. Such penetrants are generally known in the art, and include, for 
example, for transmucosal administration, detergents, bile salts, and fusidic acid derivatives. 
Transmucosal administration can be accomplished through the use of nasal sprays or 
suppositories. For transdermal administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in the art. 

30 The compounds can also be prepared in the form of suppositories (e.g., with conventional 

suppository bases such as cocoa butter and other glycerides) or retention enemas for rectal 
delivery. 
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In one embodiment, the active compounds are prepared with carriers that will protect the 
compound against rapid elimination from the body, such as a controlled release formulation, 
including implants and microencapsulated delivery systems. Biodegradable, biocompatible 
polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, 
5 collagen, polyorthoesters, and polylactic acid. Methods for preparation of such formulations will 
be apparent to those skilled in the art. The materials can also be obtained commercially from 
Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes 
targeted to infected cells with monoclonal antibodies to viral antigens) can also be used as 
pharmaceutical^ acceptable carriers. These can be prepared according to methods known to 
10 those skilled in the art, for example, as described in U.S. Patent No. 4,522,81 1 . 

It is especially advantageous to formulate oral or parenteral compositions in dosage unit 
form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers 
to physically discrete units suited as unitary dosages for the subject to be treated; each unit 
containing a predetermined quantity of active compound calculated to produce the desired 
15 therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the dosage unit forms of the invention are dictated by and directly dependent on the unique 
characteristics of the active compound and the particular therapeutic effect to be achieved, and 
the limitations inherent in the art of compounding such an active compound for the treatment of 
individuals. 

20 The nucleic acid molecules of the invention can be inserted into vectors and used as gene 

therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, intravenous 
injection, local administration (see, e.g., U.S. Patent No. 5,328,470) or by stereotactic injection 
(see, e.g., Chen, et aU 1994. Proc. Natl. Acad. Sci. USA 91: 3054-3057). The pharmaceutical 
preparation of the gene therapy vector can include the gene therapy vector in an acceptable 

25 diluent, or can comprise a slow release matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector can be produced intact from recombinant 
cells, e.g., retroviral vectors, the pharmaceutical preparation can include one or more cells that 
produce the gene delivery system. 

The pharmaceutical compositions can be included in a container, pack, or dispenser 
30 together with instructions for administration. 

Screening and Detection Methods 

The nucleic acid molecules, proteins, protein homologues, and antibodies described 
herein can be used in one or more of the following methods: (A) screening assays; (B) detection 
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assays (e.g., chromosomal mapping, cell and tissue typing, forensic biology), (C) predictive 
medicine (e.g., diagnostic assays, prognostic assays, monitoring clinical trials, and 
pharmacogenomics); and (D) methods of treatment (e.g., therapeutic and prophylactic). 

The isolated nucleic acid molecules of the present invention can be used to express SECP 
5 protein (e.g., via a recombinant expression vector in a host cell in gene therapy applications), to 
detect SECP mRNA (e.g., in a biological sample) or a genetic lesion in an SECP gene, and to 
modulate SECP activity, as described further below. In addition, the SECP proteins can be used 
to screen drugs or compounds that modulate the SECP protein activity or expression as well as to 
treat disorders characterized by insufficient or excessive production of SECP protein or 
10 production of SECP protein forms that have decreased or aberrant activity compared to SECP 
wild-type protein. In addition, the anti-SECP antibodies of the present invention can be used to 
detect and isolate SECP proteins and modulate SECP activity. 

The invention further pertains to novel agents identified by the screening assays 
described herein and uses thereof for treatments as previously described. 

1 5 Screening Assays 

The invention provides a method (also referred to herein as a "screening assay") for 
identifying modulators, candidate or test compounds or agents (e.g., peptides, 
peptidomimetics, small molecules or other drugs) that bind to SECP proteins or have a 
stimulatory or inhibitory effect on, e.g., SECP protein expression or SECP protein activity. The 

20 invention also includes compounds identified in the screening assays described herein. 

In one embodiment, the invention provides assays for screening candidate or test 
compounds which bind to or modulate the activity of the membrane-bound form of a SECP 
protein or polypeptide or biologically-active portion thereof. The test compounds of the 
invention can be obtained using any of the numerous approaches in combinatorial library 

25 methods known in the art, including: biological libraries; spatially addressable parallel solid 
phase or solution phase libraries; synthetic library methods requiring deconvolution; the 
"one-bead one-compound" library method; and synthetic library methods using affinity 
chromatography selection. The biological library approach is limited to peptide libraries, while 
the other four approaches are applicable to peptide, non-peptide oligomer or small molecule 

30 libraries of compounds. See, e.g., Lam, 1997. Anticancer Drug Design 12: 145. 

A "small molecule" as used herein, is meant to refer to a composition that has a molecular 
weight of less than about 5 kD and most preferably less than about 4 kD. Small molecules can 
be, e.g., nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids or other 
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organic or inorganic molecules. Libraries of chemical and/or biological mixtures, such as fungal, 
bacterial, or algal extracts, are known in the art and can be screened with any of the assays of the 
invention. 

Examples of methods for the synthesis of molecular libraries can be found in the art, for 
5 example in: DeWitt, et aU 1993. Proc. Natl. Acad. Sci. U.S.A. 90: 6909; Erb, et al, 1994. Proc. 
Natl. Acad. ScL U.S.A. 91: 11422; Zuckermann, et al. y 1994. /. Med. Chem. 37: 2678; Cho, etal., 
1993. Science 261: 1303; Carrell, et al, 1994. Angew. Chem. Int. Ed. Engl. 33: 2059; Carell, et 
al. y 1994. Angew. Chem. Int. Ed. Engl. 33: 2061; and Gallop, et al., 1994. /. Med. Chem. 37: 
1233. 

10 Libraries of compounds may be presented in solution {e.g., Houghten, 1992. 

Biotechniques 13: 412-421), or on beads (Lam, 1991. Nature 354: 82-84), on chips (Fodor, 1993. 
Nature 364: 555-556), bacteria (Ladner, U.S. Patent No. 5,223,409), spores (Ladner, U.S. Patent 
5,233,409), plasmids (Cull, et al, 1992. Proc. Natl Acad. Sci. USA 89: 1865-1869) or on phage 
(Scott and Smith, 1990. Science 249: 386-390; Devlin, 1990. Science 249: 404-406; Cwirla, et 

15 aU 1990. Proc. Natl. Acad. Sci. U.S.A. 87: 6378-6382; Felici, 1991. J. Mol. Biol. 222: 301-310; 
Ladner, U.S. Patent No. 5,233,409.). 

In one embodiment, an assay is a cell-based assay in which a cell which expresses a 
membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 
surface is contacted with a test compound and the ability of the test compound to bind to a SECP 

20 protein determined. The cell, for example, can of mammalian origin or a yeast cell. 

Determining the ability of the test compound to bind to the SECP protein can be accomplished, 
for example, by coupling the test compound with a radioisotope or enzymatic label such that 
binding of the test compound to the SECP protein or biologically-active portion thereof can be 
determined by detecting the labeled compound in a complex. For example, test compounds can 

25 be labeled with 125 1, 35 S, 14 C, or 3 H, either direcdy or indirectly, and the radioisotope detected by 
direct counting of radioemission or by scintillation counting. Alternatively, test compounds can 
be enzymatically-labeled with, for example, horseradish peroxidase, alkaline phosphatase, or 
luciferase, and the enzymatic label detected by determination of conversion of an appropriate 
substrate to product. In one embodiment, the assay comprises contacting a cell which expresses 

30 a membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 
surface with a known compound which binds SECP to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the ability of the test compound to interact 
with a SECP protein, wherein determining the ability of the test compound to interact with a 
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SECP protein comprises determining the ability of the test compound to preferentially bind to 
SECP protein or a biologically-active portion thereof as compared to the known compound. 

In another embodiment, an assay is a cell-based assay comprising contacting a cell 
expressing a membrane-bound form of SECP protein, or a biologically-active portion thereof, on 
5 the cell surface with a test compound and determining the ability of the test compound to 
modulate {e.g., stimulate or inhibit) the activity of the SECP protein or biologically-active 
portion thereof. Determining the ability of the test compound to modulate the activity of SECP 
or a biologically-active portion thereof can be accomplished, for example, by determining the 
ability of the SECP protein to bind to or interact with a SECP target molecule. As used herein, a 

10 "target molecule" is a molecule with which a SECP protein binds or interacts in nature, for 
example, a molecule on the surface of a cell which expresses a SECP interacting protein, a 
molecule on the surface of a second cell, a molecule in the extracellular milieu, a molecule 
associated with the internal surface of a cell membrane or a cytoplasmic molecule. An SECP 
target molecule can be a non-SECP molecule or a SECP protein or polypeptide of the invention. 

15 In one embodiment, a SECP target molecule is a component of a signal transduction pathway 
that facilitates transduction of an extracellular signal (e.g. a signal generated by binding of a 
compound to a membrane-bound SECP molecule) through the cell membrane and into the cell. 
The target, for example, can be a second intercellular protein that has catalytic activity or a 
protein that facilitates the association of downstream signaling molecules with SECP. 

20 Determining the ability of the SECP protein to bind to or interact with a SECP target 

molecule can be accomplished by one of the methods described above for determining direct 
binding. In one embodiment, determining the ability of the SECP protein to bind to or interact 
with a SECP target molecule can be accomplished by determining the activity of the target 
molecule. For example, the activity of the target molecule can be determined by detecting 

25 induction of a cellular second messenger of the target (i.e. intracellular Ca 2+ , diacylglycerol, IP3, 
etc.), detecting catalytic/enzymatic activity of the target an appropriate substrate, detecting the 
induction of a reporter gene (comprising a SECP-responsive regulatory element operatively 
linked to a nucleic acid encoding a detectable marker, e.g., luciferase), or detecting a cellular 
response, for example, cell survival, cellular differentiation, or cell proliferation. 

30 In yet another embodiment, an assay of the invention is a cell-free assay comprising 

contacting a SECP protein or biologically-active portion thereof with a test compound and 

determining the ability of the test compound to bind to the SECP protein or biologically-active 

portion thereof. Binding of the test compound to the SECP protein can be determined either 

directly or indirectly as described above. In one such embodiment, the assay comprises 
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contacting the SECP protein or biologically-active portion thereof with a known compound 
which binds SECP to form an assay mixture, contacting the assay mixture with a test compound, 
and determining the ability of the test compound to interact with a SECP protein, wherein 
determining the ability of the test compound to interact with a SECP protein comprises 
5 determining the ability of the test compound to preferentially bind to SECP or biologically-active 
portion thereof as compared to the known compound. 

In still another embodiment, an assay is a cell-free assay comprising contacting SECP 
protein or biologically-active portion thereof with a test compound and determining the ability of 
the test compound to modulate (e.g. stimulate or inhibit) the activity of the SECP protein or 

10 biologically-active portion thereof. Determining the ability of the test compound to modulate the 
activity of SECP can be accomplished, for example, by determining the ability of the SECP 
protein to bind to a SECP target molecule by one of the methods described above for 
determining direct binding. In an alternative embodiment, determining the ability of the test 
compound to modulate the activity of SECP protein can be accomplished by determining the 

15 ability of the SECP protein further modulate a SECP target molecule. For example, the 

catalytic/enzymatic activity of the target molecule on an appropriate substrate can be determined 
as described, supra. 

In yet another embodiment, the cell-free assay comprises contacting the SECP protein or 
biologically-active portion thereof with a known compound which binds SECP protein to form 
20 an assay mixture, contacting the assay mixture with a test compound, and determining the ability 
of the test compound to interact with a SECP protein, wherein determining the ability of the test 
compound to interact with a SECP protein comprises determining the ability of the SECP protein 
to preferentially bind to or modulate the activity of a SECP target molecule. 

The cell-free assays of the invention are amenable to use of both the soluble form or the 
25 membrane-bound form of SECP protein. In the case of cell-free assays comprising the 

membrane-bound form of SECP protein, it may be desirable to utilize a solubilizing agent such 
that the membrane-bound form of SECP protein is maintained in solution. Examples of such 
solubilizing agents include non-ionic detergents such as n-octylglucoside, n-dodecylglucoside, 
n-dodecylmaltoside, octanoyl-N-methylglucamide, decanoyl-N-methylglucamide, Triton® 
30 X-100, Triton® X-l 14, Thesit®, IsotridecypoIy(ethylene glycol ether) n , N-dodecyl- 

N,N-dimethyl-3-ammonio-l -propane sulfonate, 3-(3-cholamidopropyl) dimethylamminiol- 
1-propane sulfonate (CHAPS), or 3-(3-cholamidopropyl)dimethylamminiol-2-hydroxy- 
1 -propane sulfonate (CHAPSO). 



In more than one embodiment of the above assay methods of the invention, it may be 
desirable to immobilize either SECP protein or its target molecule to facilitate separation of 
complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate 
automation of the assay. Binding of a test compound to SECP protein, or interaction of SECP 
5 protein with a target molecule in the presence and absence of a candidate compound, can be 
accomplished in any vessel suitable for containing the reactants. Examples of such vessels 
include microtiter plates, test tubes, and micro-centrifuge tubes. In one embodiment, a fusion 
protein can be provided that adds a domain that allows one or both of the proteins to be bound to 
a matrix. For example, GST-SECP fusion proteins or GST-target fusion proteins can be 

10 adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, MO) or glutathione 
derivatized microtiter plates, that are then combined with the test compound or the test 
compound and either the non-adsorbed target protein or SECP protein, and the mixture is 
incubated under conditions conducive to complex formation (e.g., at physiological conditions for 
salt and pH). Following incubation, the beads or microtiter plate wells are washed to remove any 

15 unbound components, the matrix immobilized in the case of beads, complex determined either 
directly or indirectly, for example, as described, supra. Alternatively, the complexes can be 
dissociated from the matrix, and the level of SECP protein binding or activity determined using 
standard techniques. 

Other techniques for immobilizing proteins on matrices can also be used in the screening 
20 assays of the invention. For example, either the SECP protein or its target molecule can be 

immobilized utilizing conjugation of biotin and streptavidin. Biotinylated SECP protein or target 
molecules can be prepared from biotin-NHS (N-hydroxy-succinimide) using techniques well- 
known within the art (e.g., biotinylation kit, Pierce Chemicals, Rockford, 111.), and immobilized 
in the wells of streptavidin-coated 96 well plates (Pierce Chemical). Alternatively, antibodies 
25 reactive with SECP protein or target molecules, but which do not interfere with binding of the 
SECP protein to its target molecule, can be derivatized to the wells of the plate, and unbound 
target or SECP protein trapped in the wells by antibody conjugation. Methods for detecting such 
complexes, in addition to those described above for the GST-immobilized complexes, include 
immunodetection of complexes using antibodies reactive with the SECP protein or target 
30 molecule, as well as enzyme-linked assays that rely on detecting an enzymatic activity associated 
with the SECP protein or target molecule. 

In another embodiment, modulators of SECP protein expression are identified in a 
method wherein a cell is contacted with a candidate compound and the expression of SECP 
mRNA or protein in the cell is determined. The level of expression of SECP mRNA or protein 
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in the presence of the candidate compound is compared to the level of expression of SECP 
mRNA or protein in the absence of the candidate compound. The candidate compound can then 
be identified as a modulator of SECP mRNA or protein expression based upon this comparison. 
For example, when expression of SECP mRNA or protein is greater (i.e., statistically 
5 significantly greater) in the presence of the candidate compound than in its absence, the 
candidate compound is identified as a stimulator of SECP mRNA or protein expression. 
Alternatively, when expression of SECP mRNA or protein is less (statistically significantly less) 
in the presence of the candidate compound than in its absence, the candidate compound is 
identified as an inhibitor of SECP mRNA or protein expression. The level of SECP mRNA or 
10 protein expression in the cells can be determined by methods described herein for detecting 
SECP mRNA or protein. 

In yet another aspect of the invention, the SECP proteins can be used as "bait proteins" in 
a two-hybrid assay or three hybrid assay (see, e.g., U.S. Patent No. 5,283,317; Zervos, et a/., 
1993. Cellll: 223-232; Madura, etaU 1993. J. Biol Chem. 268: 12046-12054; Bartel, etal y 
15 1993. Biotechniques 14: 920-924; Iwabuchi, et at. 9 1993. Oncogene 8: 1693-1696; and Brent 
WO 94/10300), to identify other proteins that bind to or interact with SECP ("SECP-binding 
proteins" or "SECP-bp") and modulate SECP activity. Such SECP-binding proteins are also 
likely to be involved in the propagation of signals by the SECP proteins as, for example, 
upstream or downstream elements of the SECP pathway. 

20 The two-hybrid system is based on the modular nature of most transcription factors, 

which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 
different DNA constructs. In one construct, the gene that codes for SECP is fused to a gene 
encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the other 
construct, a DNA sequence, from a library of DNA sequences, that encodes an unidentified 

25 protein ("prey" or "sample") is fused to a gene that codes for the activation domain of the known 
transcription factor. If the "bait" and the "prey" proteins are able to interact, in vivo, forming a 
SECP-dependent complex, the DNA-binding and activation domains of the transcription factor 
are brought into close proximity. This proximity allows transcription of a reporter gene (e.g., 
LacZ) that is operably linked to a transcriptional regulatory site responsive to the transcription 

30 factor. Expression of the reporter gene can be detected and cell colonies containing the 

functional transcription factor can be isolated and used to obtain the cloned gene that encodes the 
protein which interacts with SECP. 

The invention further pertains to novel agents identified by the aforementioned screening 

assays and uses thereof for treatments as described herein. 
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Detection Assays 

Portions or fragments of the cDNA sequences identified herein (and the corresponding 
complete gene sequences) can be used in numerous ways as polynucleotide reagents. By way of 
example, and not of limitation, these sequences can be used to: (i) map their respective genes on 
5 a chromosome; and, thus, locate gene regions associated with genetic disease; (it) identify an 
individual from a minute biological sample (tissue typing); and (Hi) aid in forensic identification 
of a biological sample. Some of these applications are described in the subsections below. 

Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a gene has been isolated, this 
10 sequence can be used to map the location of the gene on a chromosome. This process is called 
chromosome mapping. Accordingly, portions or fragments of the SECP sequences shown in 
SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or fragments or 
derivatives thereof, can be used to map the location of the SECP genes, respectively, on a 
chromosome. The mapping of the SECP sequences to chromosomes is an important first step in 
15 correlating these sequences with genes associated with disease. 

Briefly, SECP genes can be mapped to chromosomes by preparing PCR primers 
(preferably 15-25 bp in length) from the SECP sequences. Computer analysis of the SECP, 
sequences can be used to rapidly select primers that do not span more than one exon in the 
genomic DNA, thus complicating the amplification process. These primers can then be used for 
20 PCR screening of somatic cell hybrids containing individual human chromosomes. Only those 
hybrids containing the human gene corresponding to the SECP sequences will yield an amplified 
fragment. 

Somatic cell hybrids are prepared by fusing somatic cells from different mammals (e.g., 
human and mouse cells). As hybrids of human and mouse cells grow and divide, they gradually 

25 lose human chromosomes in random order, but retain the mouse chromosomes. By using media 
in which mouse cells cannot grow, because they lack a particular enzyme, but in which human 
cells can, the one human chromosome that contains the gene encoding the needed enzyme will 
be retained. By using various media, panels of hybrid cell lines can be established. Each cell 
line in a panel contains either a single human chromosome or a small number of human 

30 chromosomes, and a full set of mouse chromosomes, allowing easy mapping of individual genes 
to specific human chromosomes. See, e.g., DEustachio, et a/., 1983. Science 220: 919-924. 
Somatic cell hybrids containing only fragments of human chromosomes can also be produced by 
using human chromosomes with translocations and deletions. 
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PCR mapping of somatic cell hybrids is a rapid procedure for assigning a particular 
sequence to a particular chromosome. Three or more sequences can be assigned per day using a 
single thermal cycler. Using the SECP sequences to design oligonucleotide primers, sub- 
localization can be achieved with panels of fragments from specific chromosomes. 

5 Fluorescence in situ hybridization (FISH) of a DNA sequence to a metaphase 

chromosomal spread can further be used to provide a precise chromosomal location in one step. 
Chromosome spreads can be made using cells whose division has been blocked in metaphase by 
a chemical like colcemid that disrupts the mitotic spindle. The chromosomes can be treated 
briefly with trypsin, and then stained with Giemsa. A pattern of light and dark bands develops 

10 on each chromosome, so that the chromosomes can be identified individually. The FISH 
technique can be used with a DNA sequence as short as 500 or 600 bases. However, clones 
larger than 1 ,000 bases have a higher likelihood of binding to a unique chromosomal location 
with sufficient signal intensity for simple detection. Preferably 1,000 bases, and more preferably 
2,000 bases, will suffice to get good results at a reasonable amount of time. For a review of this 

1 5 technique, see, Verma, et a/., Human Chromosomes: A Manual of Basic Techniques 
(Pergamon Press, New York 1988). 

Reagents for chromosome mapping can be used individually to mark a single 
chromosome or a single site on that chromosome, or panels of reagents can be used for marking 
multiple sites and/or multiple chromosomes. Reagents corresponding to non-coding regions of 
20 the genes actually are preferred for mapping purposes. Coding sequences are more likely to be 
conserved within gene families, thus increasing the chance of cross hybridizations during 
chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the physical 
position of the sequence on the chromosome can be correlated with genetic map data. Such data 
25 are found, e.g., in McKusick, Mendelian Inheritance in Man, available on-line through Johns 
Hopkins University Welch Medical Library). The relationship between genes and disease, 
mapped to the same chromosomal region, can then be identified through linkage analysis 
(co-inheritance of physically adjacent genes), described in, e.g., Egeland, etal, 1987. Nature, 
325: 783-787. 

30 Additionally, differences in the DNA sequences between individuals affected and 

unaffected with a disease associated with the SECP gene, can be determined. If a mutation is 
observed in some or all of the affected individuals but not in any unaffected individuals, then the 
mutation is likely to be the causative agent of the particular disease. Comparison of affected and 

175 



unaffected individuals generally involves first looking for structural alterations in the 
chromosomes, such as deletions or translocations that are visible from chromosome spreads or 
detectable using PCR based on that DNA sequence. Ultimately, complete sequencing of genes 
from several individuals can be performed to confirm the presence of a mutation and to 
5 distinguish mutations from polymorphisms. 

Tissue Typing 

The SECP sequences of the invention can also be used to identify individuals from 
minute biological samples. In this technique, an individual's genomic DNA is digested with one 
or more restriction enzymes, and probed on a Southern blot to yield unique bands for 
10 identification. The sequences of the invention are useful as additional DNA markers for RFLP 
("restriction fragment length polymorphisms," as described in U.S. Patent No. 5,272,057). 

Furthermore, the sequences of the invention can be used to provide an alternative 
technique that determines the actual base-by-base DNA sequence of selected portions of an 
individual's genome. Thus, the SECP sequences described herein can be used to prepare two 
15 PCR primers from the 5'- and 3-termini of the sequences. These primers can then be used to 
amplify an individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, prepared in this manner, can 
provide unique individual identifications, as each individual will have a unique set of such DNA 
sequences due to allelic differences. The sequences of the invention can be used to obtain such 

20 identification sequences from individuals and from tissue. The SECP sequences of the invention 
uniquely represent portions of the human genome. Allelic variation occurs to some degree in the 
coding regions of these sequences, and to a greater degree in the non-coding regions. It is 
estimated that allelic variation between individual humans occurs with a frequency of about once 
per each 500 bases. Much of the allelic variation is due to single nucleotide polymorphisms 

25 (SNPs), which include restriction fragment length polymorphisms (RFLPs). 

Each of the sequences described herein can, to some degree, be used as a standard against 
which DNA from an individual can be compared for identification purposes. Because greater 
numbers of polymorphisms occur in the non-coding regions, fewer sequences are necessary to 
differentiate individuals. The non-coding sequences can comfortably provide positive individual 
30 identification with a panel of perhaps 10 to 1,000 primers that each yield a non-coding amplified 
sequence of 100 bases. If predicted coding sequences, such as those in SEQ ID NO:l, 3, 5, 7, 9, 
1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are used, a more appropriate number of 
primers for positive individual identification would be 500-2,000. 
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Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic assays, 
prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 
(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 
5 the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum, 
cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
10 individual is at risk of developing a disorder associated with SECP protein, nucleic acid 

expression or activity. For example, mutations in a SECP gene can be assayed in a biological 
sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
treat an individual prior to the onset of a disorder characterized by or associated with SECP 
protein, nucleic acid expression or activity. 

15 Another aspect of the invention provides methods for determining SECP protein, nucleic 

acid expression or SECP activity in an individual to thereby select appropriate therapeutic or 
prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual (e.g., the genotype of the 

20 individual examined to determine the ability of the individual to respond to a particular agent.) 

Yet another aspect of the invention pertains to monitoring the influence of agents (e.g., 
drugs, compounds) on the expression or activity of SECP in clinical trials. 

Use of Partial SECP Sequences in Forensic Biology 
DNA-based identification techniques can also be used in forensic biology. Forensic 
25 biology is a scientific field employing genetic typing of biological evidence found at a crime 
scene as a means for positively identifying, e.g., a perpetrator of a crime. To make such an 
identification, PCR technology can be used to amplify DNA sequences taken from very small 
biological samples such as tissues (e.g., hair or skin, or body fluids, e.g., blood, saliva, or semen 
found at a crime scene). The amplified sequence can then be compared to a standard, thereby 
30 allowing identification of the origin of the biological sample. 

The sequences of the invention can be used to provide polynucleotide reagents, e.g., PCR 
primers, targeted to specific loci in the human genome, that can enhance the reliability of 
DNA-based forensic identifications by, for example, providing another "identification marker" 



(i.e. another DNA sequence that is unique to a particular individual). As mentioned above, 
actual base sequence information can be used for identification as an accurate alternative to 
patterns formed by restriction enzyme generated fragments. Sequences targeted to non-coding 
regions of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are 
5 particularly appropriate for this use as greater numbers of polymorphisms occur in the non- 
coding regions, making it easier to differentiate individuals using this technique. Examples of 
polynucleotide reagents include the SECP sequences or portions thereof, e.g., fragments derived 
from the non-coding regions of one or more of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56 having a length of at least 20 bases, preferably at least 30 bases. 

10 The SECP sequences described herein can further be used to provide polynucleotide 

reagents, e.g., labeled or label-able probes that can be used, for example, in an in situ 
hybridization technique, to identify a specific tissue (e.g., brain tissue, etc). This can be very 
useful in cases where a forensic pathologist is presented with a tissue of unknown origin. Panels 
of such SECP probes can be used to identify tissue by species and/or by organ type. 

15 In a similar fashion, these reagents, e.g., SECP primers or probes can be used to screen 

tissue culture for contamination (i.e., screen for the presence of a mixture of different types of 
cells in a culture). 

Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic assays, 
20 prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 

(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 
the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum, 
cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
25 is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with SECP protein, nucleic acid 
expression or activity. For example, mutations in a SECP gene can be assayed in a biological 
sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
30 treat an individual prior to the onset of a disorder characterized by or associated with SECP 
protein, nucleic acid expression, or biological activity. 

Another aspect of the invention provides methods for determining SECP protein, nucleic 
acid expression or activity in an individual to thereby select appropriate therapeutic or 
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prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual (e.g., the genotype of the 
individual examined to determine the ability of the individual to respond to a particular agent.) 

5 Yet another aspect of the invention pertains to monitoring the influence of agents (e.g., 

drugs, compounds) on the expression or activity of SECP in clinical trials. 

These and various other agents are described in further detail in the following sections. 
Diagnostic Assays 

An exemplary method for detecting the presence or absence of SECP in a biological 
10 sample involves obtaining a biological sample from a test subject and contacting the biological 
sample with a compound or an agent capable of detecting SECP protein or nucleic acid (e.g., 
mRNA, genomic DNA) that encodes SECP protein such that the presence of SECP is detected in 
the biological sample. An agent for detecting SECP mRNA or genomic DNA is a labeled 
nucleic acid probe capable of hybridizing to SECP mRNA or genomic DNA. The nucleic acid 
15 probe can be, for example, a full-length SECP nucleic acid, such as the nucleic acid of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 or a portion thereof, such as 
an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to 
specifically hybridize under stringent conditions to SECP mRNA or genomic DNA. Other 
suitable probes for use in the diagnostic assays of the invention are described herein. 

20 An agent for detecting SECP protein is an antibody capable of binding to SECP protein, 

preferably an antibody with a detectable label. Antibodies can be polyclonal, or more preferably, 
monoclonal. An intact antibody, or a fragment thereof (e.g., F a b or F (a b)2) can be used. The term 
"labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the 
probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or 

25 antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent 
that is directly labeled. Examples of indirect labeling include detection of a primary antibody 
using a fluorescently-labeled secondary antibody and end-labeling of a DNA probe with biotin 
such that it can be detected with fluorescently-labeled streptavidin. The term "biological 
sample" is intended to include tissues, cells and biological fluids isolated from a subject, as well 

30 as tissues, cells and fluids present within a subject. That is, the detection method of the invention 
can be used to detect SECP mRNA, protein, or genomic DNA in a biological sample in vitro as 
well as in vivo. For example, in vitro techniques for detection of SECP mRNA include Northern 
hybridizations and in situ hybridizations. In vitro techniques for detection of SECP protein 
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include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations, 
and immunofluorescence. In vitro techniques for detection of SECP genomic DNA include . 
Southern hybridizations. Furthermore, in vivo techniques for detection of SECP protein include 
introducing into a subject a labeled anti-SECP antibody. For example, the antibody can be 
5 labeled with a radioactive marker whose presence and location in a subject can be detected by 
standard imaging techniques. 

In one embodiment, the biological sample contains protein molecules from the test 
subject. Alternatively, the biological sample can contain mRNA molecules from the test subject 
or genomic DNA molecules from the test subject. A preferred biological sample is a peripheral 
10 blood leukocyte sample isolated by conventional means from a subject. 

In another embodiment, the methods further involve obtaining a control biological 
sample from a control subject, contacting the control sample with a compound or agent capable 
of detecting SECP protein, mRNA, or genomic DNA, such that the presence of SECP protein, 
mRNA or genomic DNA is detected in the biological sample, and comparing the presence of 
15 SECP protein, mRNA or genomic DNA in the control sample with the presence of SECP 
protein, mRNA or genomic DNA in the test sample. 

The invention also encompasses kits for detecting the presence of SECP in a biological 
sample. For example, the kit can comprise: a labeled compound or agent capable of detecting 
SECP protein or mRNA in a biological sample; means for determining the amount of SECP in 
20 the sample; and means for comparing the amount of SECP in the sample with a standard. The 
compound or agent can be packaged in a suitable container. The kit can further comprise 
instructions for using the kit to detect SECP protein or nucleic acid. 

Prognostic Assays 

The diagnostic methods described herein can furthermore be utilized to identify subjects 
25 having or at risk of developing a disease or disorder associated with aberrant SECP expression or 
activity. For example, the assays described herein, such as the preceding diagnostic assays or the 
following assays, can be utilized to identify a subject having or at risk of developing a disorder 
associated with SECP protein, nucleic acid expression or activity. Alternatively, the prognostic 
assays can be utilized to identify a subject having or at risk for developing a disease or disorder. 
30 Thus, the invention provides a method for identifying a disease or disorder associated with 

aberrant SECP expression or activity in which a test sample is obtained from a subject and SECP 
protein or nucleic acid (e.g., mRNA, genomic DNA) is detected, wherein the presence of SECP 
protein or nucleic acid is diagnostic for a subject having or at risk of developing a disease or 
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disorder associated with aberrant SECP expression or activity. As used herein, a "test sample" 
refers to a biological sample obtained from a subject of interest. For example, a test sample can 
be a biological fluid (e.g., serum), cell sample, or tissue. 

Furthermore, the prognostic assays described herein can be used to determine whether a 
5 subject can be administered an agent (e.g., an agonist, antagonist, peptidomimetic, protein, 
peptide, nucleic acid, small molecule, or other drug candidate) to treat a disease or disorder 
associated with aberrant SECP expression or activity. For example, such methods can be used to 
determine whether a subject can be effectively treated with an agent for a disorder. Thus, the 
invention provides methods for determining whether a subject can be effectively treated with an 
10 agent for a disorder associated with aberrant SECP expression or activity in which a test sample 
is obtained and SECP protein or nucleic acid is detected (e.g., wherein the presence of SECP 
protein or nucleic acid is diagnostic for a subject that can be administered the agent to treat a 
disorder associated with aberrant SECP expression or activity). 

The methods of the invention can also be used to detect genetic lesions in a SECP gene, 

15 thereby determining if a subject with the lesioned gene is at risk for a disorder characterized by 
aberrant cell proliferation and/or differentiation. In various embodiments, the methods include 
detecting, in a sample of cells from the subject, the presence or absence of a genetic lesion 
characterized by at least one of an alteration affecting the integrity of a gene encoding a 
SECP-protein, or the mis-expression of the SECP gene. For example, such genetic lesions can 

20 be detected by ascertaining the existence of at least one of: (i) a deletion of one or more 
nucleotides from a SECP gene; (ii) an addition of one or more nucleotides to a SECP gene; 
(Hi) a substitution of one or more nucleotides of a SECP gene, (iv) a chromosomal rearrangement 
of a SECP gene; (v) an alteration in the level of a messenger RNA transcript of a SECP gene, 
(vi) aberrant modification of a SECP gene, such as of the methylation pattern of the genomic 

25 DNA, (vii) the presence of a non-wild-type splicing pattern of a messenger RNA transcript of a 
SECP gene, (viii) a non-wild-type level of a SECP protein, (ix) allelic loss of a SECP gene, and 
(x) inappropriate post-translational modification of a SECP protein. As described herein, there 
are a large number of assay techniques known in the art which can be used for detecting lesions 
in a SECP gene. A preferred biological sample is a peripheral blood leukocyte sample isolated 

30 by conventional means from a subject. However, any biological sample containing nucleated 
cells may be used, including, for example, buccal mucosal cells. 

In certain embodiments, detection of the lesion involves the use of a probe/primer in a 

polymerase chain reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 4,683,202), such as 

anchor PCR or RACE PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g., 
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Landegran, et al. y 1988. Science 241 : 1077-1080; and Nakazawa, et ah, 1994. Proc. Natl. Acad. 
ScL USA 91: 360-364), the latter of which can be particularly useful for detecting point 
mutations in the SECP-gene (see, Abravaya, et al. y 1995. Nucl Acids Res. 23: 675-682). This 
method can include the steps of collecting a sample of cells from a patient, isolating nucleic acid 
5 (e.g., genomic, mRNA or both) from the cells of the sample, contacting the nucleic acid sample 
with one or more primers that specifically hybridize to a SECP gene under conditions such that 
hybridization and amplification of the SECP gene (if present) occurs, and detecting the presence 
or absence of an amplification product, or detecting the size of the amplification product and 
comparing the length to a control sample. It is anticipated that PGR and/or LCR may be 
10 desirable to use as a preliminary amplification step in conjunction with any of the techniques 
used for detecting mutations described herein. 

Alternative amplification methods include: self sustained sequence replication (see, 
Guatelli, et al, 1990. Proc. Natl. Acad. Sci. USA 87: 1874-1878), transcriptional amplification 
system (see, Kwoh, etal., 1989. Proc. Natl. Acad. Sci. USA 86: 1173-1177); QP Replicase (see, 
15 Lizardi, et al 7 1988. BioTechnology 6: 1 197), or any other nucleic acid amplification method, 
followed by the detection of the amplified molecules using techniques well known to those of 
skill in the art. These detection schemes are especially useful for the detection of nucleic acid 
molecules if such molecules are present in very low numbers. 

In an alternative embodiment, mutations in a SECP gene from a sample cell can be 
20 identified by alterations in restriction enzyme cleavage patterns. For example, sample and 
control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and compared. 
Differences in fragment length sizes between sample and control DNA indicates mutations in the 
sample DNA. Moreover, the use of sequence specific ribozymes (see, e.g., U.S. Patent No. 
25 5,493,531) can be used to score for the presence of specific mutations by development or loss of 
a ribozyme cleavage site. 

In other embodiments, genetic mutations in SECP can be identified by hybridizing a 
sample and control nucleic acids, e.g., DNA or RNA, to high-density arrays containing hundreds 
or thousands of oligonucleotides probes. See, e.g., Cronin, et at., 1996. Human Mutation 7: 
30 244-255; Kozal, et ai, 1996. Nat. Med. 2: 753-759. For example, genetic mutations in SECP can 
be identified in two dimensional arrays containing light-generated DNA probes as described in 
Cronin, et ah, supra. Briefly, a first hybridization array of probes can be used to scan through 
long stretches of DNA in a sample and control to identify base changes between the sequences 
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by making linear arrays of sequential overlapping probes. This step allows the identification of 
point mutations. This is followed by a second hybridization array that allows the 
characterization of specific mutations by using smaller, specialized probe arrays complementary 
to all variants or mutations detected. Each mutation array is composed of parallel probe sets, one 
5 complementary to the wild-type gene and the other complementary to the mutant gene. 

In yet another embodiment, any of a variety of sequencing reactions known in the art can 
be used to directly sequence the SECP gene and detect mutations by comparing the sequence of 
the sample SECP with the corresponding wild-type (control) sequence. Examples of sequencing 
reactions include those based on techniques developed by Maxim and Gilbert, 1977. Proc. Natl 

10 Acad. Sci. USA 74: 560 or Sanger, 1977. Proc. Natl Acad. Sci. USA 74: 5463. It is also 

contemplated that any of a variety of automated sequencing procedures can be utilized when 
performing the diagnostic assays (see, e.g., Naeve, et al. 9 1995. Biotechniques 19: 448), 
including sequencing by mass spectrometry (see, e.g., PCT International Publication No. WO 
94/16101; Cohen, et al. y 1996. Adv. Chromatography 36: 127-162; and Griffin, et al. y 1993. 

15 Appl. Biochem. Biotechnol. 38: 147-159). 

Other methods for detecting mutations in the SECP gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA 
heteroduplexes. See, e.g., Myers, et al., 1985. Science 230: 1242. In general, the art technique 
of "mismatch cleavage" starts by providing heteroduplexes of formed by hybridizing (labeled) 

20 RNA or DNA containing the wild-type SECP sequence with potentially mutant RNA or DNA 
obtained from a tissue sample. The double-stranded duplexes are treated with an agent that 
cleaves single-stranded regions of the duplex such as which will exist due to basepair 
mismatches between the control and sample strands. For instance, RNA/DNA duplexes can be 
treated with RNase and DNA/DNA hybrids treated with S\ nuclease to enzymatically digesting 

25 the mismatched regions. In other embodiments, either DNA/DNA or RNA/DNA duplexes can 
be treated with hydroxylamine or osmium tetroxide and with piperidine in order to digest 
mismatched regions. After digestion of the mismatched regions, the resulting material is then 
separated by size on denaturing polyacrylamide gels to determine the site of mutation. See, e.g., 
Cotton, et al, 1988. Proc. Natl Acad. Sci. USA 85: 4397; Saleeba, et al, 1992. Methods 

30 Enzymol 217: 286-295. In an embodiment, the control DNA or RNA can be labeled for 
detection. 

In still another embodiment, the mismatch cleavage reaction employs one or more 

proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 

mismatch repair" enzymes) in defined systems for detecting and mapping point mutations in 
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SECP cDNAs obtained from samples of cells. For example, the mutY enzyme of E. coli cleaves 
A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells cleaves T at G/T 
mismatches. See, e.g., Hsu, etal, 1994. Carcinogenesis 15: 1657-1662. According to an 
exemplary embodiment, a probe based on a SECP sequence, e.g., a wild-type SECP sequence, is 
5 hybridized to a cDNA or other DNA product from a test cell(s). The duplex is treated with a 
DNA mismatch repair enzyme, and the cleavage products, if any, can be detected from 
electrophoresis protocols or the like. See, e.g., U.S. Patent No. 5,459,039. 

In other embodiments, alterations in electrophoretic mobility will be used to identify 
mutations in SECP genes. For example, single strand conformation polymorphism (SSCP) may 

10 be used to detect differences in electrophoretic mobility between mutant and wild type nucleic 
acids. See, e.g., Orita, etal 9 1989. Proc. Natl. Acad. Sci. USA: 86: 2766; Cotton, 1993. Mutat. 
Res. 285: 125-144; Hayashi, 1992. Genet. Anal Tech. Appl. 9: 73-79. Single-stranded DNA 
fragments of sample and control SECP nucleic acids will be denatured and allowed to renature. 
The secondary structure of single-stranded nucleic acids varies according to sequence, the 

15 resulting alteration in electrophoretic mobility enables the detection of even a single base change. 
The DNA fragments may be labeled or detected with labeled probes. The sensitivity of the assay 
may be enhanced by using RNA (rather than DNA), in which the secondary structure is more 
sensitive to a change in sequence. In one embodiment, the subject method utilizes heteroduplex 
analysis to separate double stranded heteroduplex molecules on the basis of changes in 

20 electrophoretic mobility. See, e.g., Keen, et ah, 1991. Trends Genet. 7: 5. 

In yet another embodiment, the movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing gradient gel 
electrophoresis (DGGE). See, e.g., Myers, etal, 1985. Nature 313: 495. When DGGE is used 
as the method of analysis, DNA will be modified to insure that it does not completely denature, 
25 for example by adding a GC clamp of approximately 40 bp of high-melting GC-rich DNA by 
PCR. In a further embodiment, a temperature gradient is used in place of a denaturing gradient 
to identify differences in the mobility of control and sample DNA. See, e.g., Rosenbaum and 
Reissner, 1987. Biophys. Chem. 265: 12753. 

Examples of other techniques for detecting point mutations include, but are not limited 

30 to, selective oligonucleotide hybridization, selective amplification, or selective primer extension. 

For example, oligonucleotide primers may be prepared in which the known mutation is placed 

centrally and then hybridized to target DNA under conditions that permit hybridization only if a 

perfect match is found. See, e.g., Saiki, et al, 1986. Nature 324: 163; Saiki, et al y 1989. Proc. 

Natl Acad. Sci. USA 86: 6230. Such allele specific oligonucleotides are hybridized to PCR 
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amplified target DNA or a number of different mutations when the oligonucleotides are attached 
to the hybridizing membrane and hybridized with labeled target DNA. 

Alternatively, allele specific amplification technology that depends on selective PCR 
amplification may be used in conjunction with the instant invention. Oligonucleotides used as 
5 primers for specific amplification may carry the mutation of interest in the center of the molecule 
(so that amplification depends on differential hybridization; see, e.g., Gibbs, et al. y 1989. NucL 
Acids Res. 17: 2437-2448) or at the extreme 3'-terminus of one primer where, under appropriate 
conditions, mismatch can prevent, or reduce polymerase extension (see, e.g., Prossner, 1993. 
Tibtech. 11: 238). In addition it may be desirable to introduce a novel restriction site in the 

10 region of the mutation to create cleavage-based detection. See, e.g., Gasparini, et al, 1992. Mol 
Cell Probes 6: 1. It is anticipated that in certain embodiments amplification may also be 
performed using Taq ligase for amplification. See, e.g., Barany, 1991. Proc. Natl. Acad. Sci. 
USA 88: 189. In such cases, ligation will occur only if there is a perfect match at the 3-terminus 
of the 5' sequence, making it possible to detect the presence of a known mutation at a specific 

15 site by looking for the presence or absence of amplification. 

The methods described herein may be performed, for example, by utilizing pre-packaged 
diagnostic kits comprising at least one probe nucleic acid or antibody reagent described herein, 
which may be conveniently used, e.g., in clinical settings to diagnose patients exhibiting 
symptoms or family history of a disease or illness involving a SECP gene. 

20 Furthermore, any cell type or tissue, preferably peripheral blood leukocytes, in which 

SECP is expressed may be utilized in the prognostic assays described herein. However, any 
biological sample containing nucleated cells may be used, including, for example, buccal 
mucosal cells. 

Pharmacogenomics 

25 Agents, or modulators that have a stimulatory or inhibitory effect on SECP activity (e.g., 

SECP gene expression), as identified by a screening assay described herein can be administered 
to individuals to treat (prophylactically or therapeutically) disorders (e.g., cancer or immune 
disorders associated with aberrant SECP activity. In conjunction with such treatment, the 
pharmacogenomics (i.e., the study of the relationship between an individual's genotype and that 

30 individual's response to a foreign compound or drug) of the individual may be considered. 
Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood concentration of the pharmacologically active drug. 
Thus, the pharmacogenomics of the individual permits the selection of effective agents (e.g., 

185 



\ 
F 

drugs) for prophylactic or therapeutic treatments based on a consideration of the individual's 
genotype. Such pharmacogenomics can further be used to determine appropriate dosages and 
therapeutic regimens. Accordingly, the activity of SECP protein, expression of SECP nucleic 
acid, or mutation content of SECP genes in an individual can be determined to thereby select 
5 appropriate agent(s) for therapeutic or prophylactic treatment of the individual. 

Pharmacogenomics deals with clinically significant hereditary variations in the response 
to drugs due to altered drug disposition and abnormal action in affected persons. See e.g., 
Eichelbaum, 1996. Clin. Exp. Pharmacol. Physiol. 23: 983-985; Linder, 1997. Clin. Chem., 43: 
254-266. In general, two types of pharmacogenetic conditions can be differentiated. Genetic 

10 conditions transmitted as a single factor altering the way drugs act on the body (altered drug 

action) or genetic conditions transmitted as single factors altering the way the body acts on drugs 
(altered drug metabolism). These pharmacogenetic conditions can occur either as rare defects or 
as polymorphisms. For example, glucose-6-phosphate dehydrogenase (G6PD) deficiency is a 
common inherited enzymopathy in which the main clinical complication is hemolysis after 

15 ingestion of oxidant drugs (anti-malarials, sulfonamides, analgesics, nitrofurans) and 
consumption of fava beans. 

As an illustrative embodiment, the activity of drug metabolizing enzymes is a major 
determinant of both the intensity and duration of drug action. The discovery of genetic 
polymorphisms of drug metabolizing enzymes {e.g., N-acetyltransferase 2 (NAT 2) and 

20 cytochrome P450 enzymes CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or show exaggerated drug response and 
serious toxicity after taking the standard and safe dose of a drug. These polymorphisms are 
expressed in two phenotypes in the population, the extensive metabolizer (EM) and poor 
metabolizer (PM). The prevalence of PM is different among different populations. For example, 

25 the gene coding for CYP2D6 is highly polymorphic and several mutations have been identified 
in PM, which all lead to the absence of functional CYP2D6. Poor metabolizers of CYP2D6 and 
CYP2C19 quite frequendy experience exaggerated drug response and side effects when they 
receive standard doses. If a metabolite is the active therapeutic moiety, PM show no therapeutic 
response, as demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed 

30 metabolite morphine. At the other extreme are the so called ultra-rapid metabolizers who do not 
respond to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been 
identified to be due to CYP2D6 gene amplification. 

Thus, the activity of SECP protein, expression of SECP nucleic acid, or mutation content 
of SECP genes in an individual can be determined to thereby select appropriate agent(s) for 



therapeutic or prophylactic treatment of the individual. In addition, pharmacogenetic studies can 
be used to apply genotyping of polymorphic alleles encoding drug-metabolizing enzymes to the 
identification of an individual's drug responsiveness phenotype. This knowledge, when applied 
to dosing or drug selection, can avoid adverse reactions or therapeutic failure and thus enhance 
5 therapeutic or prophylactic efficiency when treating a subject with a SECP modulator, such as a 
modulator identified by one of the exemplary screening assays described herein. 

Monitoring of Effects During Clinical Trials 

Monitoring the influence of agents (e.g., drugs, compounds) on the expression or activity 
of SECP (e.g., the ability to modulate aberrant cell proliferation and/or differentiation) can be 

10 applied not only in basic drug screening, but also in clinical trials. For example, the 

effectiveness of an agent determined by a screening assay as described herein to increase SECP 
gene expression, protein levels, or upregulate SECP activity, can be monitored in clinical trails 
of subjects exhibiting decreased SECP gene expression, protein levels, or down-regulated SECP 
activity. Alternatively, the effectiveness of an agent determined by a screening assay to decrease 

15 SECP gene expression, protein levels, or down-regulate SECP activity, can be monitored in 
clinical trails of subjects exhibiting increased SECP gene expression, protein levels, or up- 
regulated SECP activity. In such clinical trials, the expression or activity of SECP and, 
preferably, other genes that have been implicated in, for example, a cellular proliferation or 
immune disorder can be used as a "read out" or markers of the immune responsiveness of a 

20 particular cell. 

By way of example, and not of limitation, genes, including SECP, that are modulated in 
cells by treatment with an agent (e.g., compound, drug or small molecule) that modulates SECP 
activity (e.g., identified in a screening assay as described herein) can be identified. Thus, to 
study the effect of agents on cellular proliferation disorders, for example, in a clinical trial, cells 

25 can be isolated and RNA prepared and analyzed for the levels of expression of SECP and other 
genes implicated in the disorder. The levels of gene expression (i.e., a gene expression pattern) 
can be quantified by Northern blot analysis or RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the methods as described herein, or by 
measuring the levels of activity of SECP or other genes. In this manner, the gene expression 

30 pattern can serve as a marker, indicative of the physiological response of the cells to the agent. 
Accordingly, this response state may be determined before, and at various points during, 
treatment of the individual with the agent. 
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In one embodiment, the invention provides a method for monitoring the effectiveness of 
treatment of a subject with an agent (e.g., an agonist, antagonist, protein, peptide, 
peptidomimetic, nucleic acid, small molecule, or other drug candidate identified by the screening 
assays described herein) comprising the steps of (i) obtaining a pre-administration sample from a 
5 subject prior to administration of the agent; (it) detecting the level of expression of a SECP 
protein, mRNA, or genomic DNA in the pre-administration sample; (Hi) obtaining one or more 
post-administration samples from the subject; (iv) detecting the level of expression or activity of 
the SECP protein, mRNA, or genomic DNA in the post-administration samples; (v) comparing 
the level of expression or activity of the SECP protein, mRNA, or genomic DNA in the 

10 pre-administration sample with the SECP protein, mRNA, or genomic DNA in the post 

administration sample or samples; and (vi) altering the administration of the agent to the subject 
accordingly. For example, increased administration of the agent may be desirable to increase the 
expression or activity of SECP to higher levels than detected, i.e., to increase the effectiveness of 
the agent. Alternatively, decreased administration of the agent may be desirable to decrease 

15 expression or activity of SECP to lower levels than detected, i.e., to decrease the effectiveness of 
the agent. 

Methods of Treatment 

The invention provides for both prophylactic and therapeutic methods of treating a 
subject at risk of (or susceptible to) a disorder or having a disorder associated with aberrant 
20 SECP expression or activity. These methods of treatment will be discussed more fully, below. 

Disease and Disorders 

Diseases and disorders that are characterized by increased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
Therapeutics that antagonize (i.e., reduce or inhibit) activity. Therapeutics that antagonize 

25 activity may be administered in a therapeutic or prophylactic manner. Therapeutics that may be 
utilized include, but are not limited to: (i) an aforementioned peptide, or analogs, derivatives, 
fragments or homologs thereof; (ii) antibodies to an aforementioned peptide; (Hi) nucleic acids 
encoding an aforementioned peptide; (iv) administration of antisense nucleic acid and nucleic 
acids that are "dysfunctional" (i.e., due to a heterologous insertion within the coding sequences 

30 of coding sequences to an aforementioned peptide) that are utilized to "knockout" endoggenous 
function of an aforementioned peptide by homologous recombination (see, e.g., Capecchi, 1989. 
Science 244: 1288-1292); or (v) modulators ( i.e., inhibitors, agonists and antagonists, including 
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additional peptide mimetic of the invention or antibodies specific to a peptide of the invention) 
that alter the interaction between an aforementioned peptide and its binding partner. 

Diseases and disorders that are characterized by decreased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
5 Therapeutics that increase (i.e. f are agonists to) activity. Therapeutics that upregulate activity 
may be administered in a therapeutic or prophylactic manner. Therapeutics that may be utilized 
include, but are not limited to, an aforementioned peptide, or analogs, derivatives, fragments or 
homologs thereof; or an agonist that increases bioavailability. 

Increased or decreased levels can be readily detected by quantifying peptide and/or RNA, 
10 by obtaining a patient tissue sample (e.g., from biopsy tissue) and assaying it in vitro for RNA or 
peptide levels, structure and/or activity of the expressed peptides (or mRNAs of an 
aforementioned peptide). Methods that are well-known within the art include, but are not limited 
to, immunoassays (e.g., by Western blot analysis, immunoprecipitation followed by sodium 
dodecyl sulfate (SDS) polyacrylamide gel electrophoresis, immunocytochemistry, etc.) and/or 
15 hybridization assays to detect expression of mRNAs (e.g., Northern assays, dot blots, in situ 
hybridization, and the like). 

Prophylactic Methods 

In one aspect, the invention provides a method for preventing, in a subject, a disease or 
condition associated with an aberrant SECP expression or activity, by administering to the 

20 subject an agent that modulates SECP expression or at least one SECP activity. Subjects at risk 
for a disease that is caused or contributed to by aberrant SECP expression or activity can be 
identified by, for example, any or a combination of diagnostic or prognostic assays as described 
herein. Administration of a prophylactic agent can occur prior to the manifestation of symptoms 
characteristic of the SECP aberrancy, such that a disease or disorder is prevented or, 

25 alternatively, delayed in its progression. Depending upon the type of SECP aberrancy, for 
example, a SECP agonist or SECP antagonist agent can be used for treating the subject. The 
appropriate agent can be determined based on screening assays described herein. 

Therapeutic Methods 

Another aspect of the invention pertains to methods of modulating SECP expression or 

30 activity for therapeutic purposes. The modulatory method of the invention involves contacting a 

cell with an agent that modulates one or more of the activities of SECP protein activity 

associated with the cell. An agent that modulates SECP protein activity can be an agent as 

described herein, such as a nucleic acid or a protein, a naturally-occurring cognate ligand of a 
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SECP protein, a peptide, a SECP peptidomimetic, or other small molecule. In one embodiment, 
the agent stimulates one or more SECP protein activity. Examples of such stimulatory agents 
include active SECP protein and a nucleic acid molecule encoding SECP that has been 
introduced into the cell. In another embodiment, the agent inhibits one or more SECP protein 
5 activity. Examples of such inhibitory agents include antisense SECP nucleic acid molecules and 
anti-SECP antibodies. These modulatory methods can be performed in vitro (e.g., by culturing 
the cell with the agent) or, alternatively, in vivo (e.g., by administering the agent to a subject). 
As such, the invention provides methods of treating an individual afflicted with a disease or 
disorder characterized by aberrant expression or activity of a SECP protein or nucleic acid 
10 molecule. In one embodiment, the method involves administering an agent (e.g., an agent 

identified by a screening assay described herein), or combination of agents that modulates (e.g., 
up-regulates or down-regulates) SECP expression or activity. In another embodiment, the 
method involves administering a SECP protein or nucleic acid molecule as therapy to 
compensate for reduced or aberrant SECP expression or activity. 

15 Stimulation of SECP activity is desirable in situations in which SECP is abnormally 

down-regulated and/or in which increased SECP activity is likely to have a beneficial effect. 
One example of such a situation is where a subject has a disorder characterized by aberrant cell 
proliferation and/or differentiation (e.g., cancer or immune associated disorders). Another 
example of such a situation is where the subject has a gestational disease (e.g., pre-clampsia). 

20 Determination of the Biological Effect of the Therapeutic 

In various embodiments^ the invention, suitable in vitro or in vivo assays are performed 
to determine the effect of a specific Therapeutic and whether its administration is indicated for 
treatment of the affected tissue. 

In various specific embodiments, in vitro assays may be performed with representative 
25 cells of the type(s) involved in the patient's disorder, to determine if a given Therapeutic exerts 
the desired effect upon the cell type(s). Compounds for use in therapy may be tested in suitable 
animal model systems including, but not limited to rats, mice, chicken, cows, monkeys, rabbits, 
and the like, prior to testing in human subjects. Similarly, for in vivo testing, any of the animal 
model system known in the art may be used prior to administration to human subjects. 

30 Prophylactic and Therapeutic Uses of the Compositions of the Invention 

The SECP nucleic acids and proteins of the invention may be useful in a variety of 
potential prophylactic and therapeutic applications. By way of a non-limiting example, a cDNA 
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encoding the SECP protein of the invention may be useful in gene therapy, and the protein may 
be useful when administered to a subject in need thereof. 

Both the novel nucleic acids encoding the SECP proteins, and the SECP proteins of the 
invention, or fragments thereof, may also be useful in diagnostic applications, wherein the 
5 presence or amount of the nucleic acid or the protein are to be assessed. These materials are 
further useful in the generation of antibodies which immunospecifically-bind to the novel 
substances of the invention for use in therapeutic or diagnostic methods. 

The invention will be further illustrated in the following non-limiting examples. 

10 Example 1 : Radiation Hybrid Mapping Provides the Chromosomal 
Location of SECP 2 (Clone 1 1618130.0.27) 

Radiation hybrid mapping using human chromosome markers was carried out to 

determine the chromosomal location of a SECP2 nuclei acid of the invention. The procedure 

used to obtain these results is described generally in Steen, et al., 1999. A High-Density 

15 Integrated Genetic Linkage and Radiation Hybrid Map of the Laboratory Rat, Genome Res. 9: 
AP1-AP8 (Published Online on May 21, 1999). A panel of 93 cell clones containing randomized 
radiation-induced human chromosomal fragments was then screened in 96 well plates using PCR 
primers designed to identify the sought clones in a unique fashion. Clone 11618130.0.27, a 
SECP2 nucleic acid was located on chromosome 16 at a map distance of 26.0 cR from marker 

20 WI-3768 and -70.5 cR from marker TIGR-A002K05. 

Example 2: Molecular Cloning of Clone 11618130 

Oligonucleotide PCR primers were designed to amplify a DNA segment coding for the 
full length open reading frame of clone 1 1618130. The forward primer included a Bgl II 
restriction site and the consensus Kozak sequence CCACC. The reverse primer contained an 
25 in-frame Xhol restriction site. Both primers contained a CTCGTC 5'-terminus clamp. The 
nucleotide sequences of the primers were: 

11618130 Forward Primer: 

CTCGTCAGATCTCCACCATGAGTGATGAGGACAGCTGTGTAG (SEQ ID NO: 1 9) 

11618130 Reverse Primer: 

30 CTCGTCCTCGAGGCAGCTGGTTGGTTGGCTTATGTTG (SEQ ID NO:20) 

The PCR reactions included: 5 ng human fetal brain cDNA template; 1 jiM of each of the 
1 1618130 Forward and 1 1618130 Reverse primers; 5 piM dNTP (Clontech Laboratories; Palo 
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Alto, CA) and 1 |xl of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, CA) 
in 50 nl total reaction volume. The following PCR conditions were used: 

a) 96°C 3 minutes 

b) 96°C 30 seconds denaturation 

5 c) 70°C 30 seconds, primer annealing. This temperature was gradually decreased 

by l°C/cycle 

d) 72°C 1 minute extension. 
Repeat steps b-d a total of 10-times 

e) 96°C 30 seconds denaturation 
10 f) 60°C 30 seconds annealing 

g) 72°C 1 minute extension 
Repeat steps e-g a total of 25-times 

h) 72°C 5 minutes final extension 

A single, amplified product of approximately 800 bp was detected by agarose gel 
15 electrophoresis. The PCR amplification product was then isolated by the QIAEX II® Gel 
Extraction System (QIAGEN, Inc; Valencia, CA) in a final volume of 20 

A total of 10 \il of the isolated fragment was digested with Bgl II and Xhol restriction 
enzymes, and ligated into the BamHI- and Xhol-digested mammalian expression vector 
pCDNA3.1 V5His (Invitrogen; Carlsbad, CA.). The construct was sequenced, and the cloned 
20 insert was verified as a sequence identical to the ORF coding for the full length 1 1618130. The 
construct was designated pcDNA3.1-l 1618130-S178-2. 

Example 3: Expression of 11618130 In Human Embryonic Kidney 293 Cells 

The vector pcDNA3.1-l 1618130-S178-2 described in Example 2 was subsequendy 
transfected into human embryonic kidney 293 cells (ATCC No. CRL-1573; Manassas, VA) 

25 using the LipofectaminePlus Reagent following the manufacturer's instructions (Gibco/BRL/Life 
Technologies; Rockville, MD) The cell pellet and supernatant were harvested 72 hours after 
transfection, and examined for 1 1618130 expression by use of SDS-PAGE under reducing 
conditions and Western blotting with an anti-V5 antibody. FIG. 12 shows that 1 1618130 was 
expressed as a protein having an apparent molecular weight (Mr) of approximately 34 kilo 

30 Daltons (kDa) which was intracellularly expressed in the 293 cells. These experimental results 
were consistent with the predicted molecular weight of 28043 Daltons for the protein of clone 
1 1618130.0.27 and with the predicted localization of the protein intracellularly in the microbody 
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(peroxisome). A second band of approximately 54 kDa was also found, which may represent a 
non-reducible dimer of this protein. 

Example 4: Preparation of Mammalian Expression Vector pSecVSHis 

The oligonucleotide primers, pSec-V5-His Forward and pSec-V5-His Reverse, were 
5 generated to amplify a fragment from the pcDNA3. l-V5His (Invitrogen; Carlsbad, CA) 

expression vector that includes V5 and His6. The nucleotide sequences of these primers were: 

pSec-V5-His Forward Primer: 

CTCGTCCTCGAGGGTAAGCCTATCCCTAAC (SEQ ID NO:2 1 ) 

pSec-V5-His Reverse Primer: 

10 CTCGTCGGGCCCCTGATCAGCGGGTTTAAAC (SEQ ID NQ:22) 

The PCR product was digested with Xhol and Apal, and ligated into the Xhol/Apal- 
digested pSecTag2 B vector harboring an Ig kappa leader sequence (Invitrogen; Carlsbad, CA). 
The correct structure of the resulting vector (designated pSecV5His), including an in-frame 
Ig-kappa leader and V5-His6, was verified by DNA sequence analysis. The pSecVSHis vector 
15 included an in-frame Ig kappa leader, a site for insertion of a clone of interest, V5 and His6, 

which allows heterologous protein expression and secretion by fusing any protein to the Ig kappa 
chain signal peptide. Detection and purification of the expressed protein was aided by the 
presence of the V5 epitope tag and 6x His tag at the carboxyl-terminus (Invitrogen; Carlsbad, 
CA). 

20 Example 5: Molecular Cloning of 16406477 

Oligonucleotide PCR primers were designed to amplify a DNA segment encoding for the 
mature form of clone 16406477 from amino acid residues 38 to 385, recognition of the signal 
sequence predicted for this polypeptide. The forward primer contained an in-frame BamHI 
restriction site and the reverse primer contained an in-frame Xhol restriction site. Both primers 
25 contained the CTCGTC 5' clamp. The sequences of the primers were as follows: 

16406477 Forward Primer: 

CTCGTCGGATCCTGGGGCGCAGGGGAAGCCCCGGG (SEQ ID NO:23) 

16406477 Reverse Primer: 

CTCGTCCTCGAGGAGGGCAGCAAGGAGGCTGAGGGGCAG (SEQ ED NO:24) 

30 The PCR reactions contained: 5 ng human fetal brain cDNA template; 1 |iM of each of 

the 16406477 Forward and 16406477 Reverse Primers; 5 fiM dNTP (Clontech Laboratories; 
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Palo Alto, CA) and 1 pi of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, 
CA) in a 50 jxl total reaction volume. PCR was then conducted using reaction conditions 
identical to those previously described in Example 2. 

A single, amplified product of approximately 1 Kbp»was detected by agarose gel 
5 electrophoresis. The product was then isolated by QIAEX II® Gel Extraction System 
(QUIAGEN, Inc; Valencia, CA) in a total reaction volume of 20 

A total of 10 jxl of the isolated fragment was digested with BamHI and Xhol restriction 
enzymes, and ligated into the pSecV5-His mammalian expression vector {see. Example 4) which 
had been previously-digested with BamHI and Xhol. The construct was sequenced, and the 
10 cloned insert was verified as possessing a sequence identical to that of the ORF coding for the 
mature fragment of clone 16406477. The construct was subsequently designated pSecVSHis- 
16406477-S196-A. 

Example 6: Expression of 16406477 in Human Embryonic Kidney 293 Cells 

The pSecV5His-16406477-S196-A construct {see, Example 5) was subsequendy 
15 transfected into 293 cells (ATCC No. CRL-1573; Manassas, VA) using the LipofectaminePlus 
Reagent following the manufacturer's instructions (Gibco/BRL/Life Technologies). The cell 
pellet and supernatant were harvested 72 hours after transfection, and examined for 16406477 
expression by use of SDS-PAGE under reducing conditions and Western blotting with an anti- 
V5 antibody. FIG. 13 demonstrates that 16406477 is expressed as a protein having an apparent 
20 molecular weight (Mr) of approximately 45 kDa which is retained intracellular^ in the 293 cells. 
The Mr value which was found upon expression of the clone is consistent with the predicted 
molecular weight of 43087 Daltons. 

Example 7: Quantitative Tissue Expression Analysis of Clones of the Invention 

The Quantitative Expression Analysis of several clones of the invention was preformed in 
25 41 normal and 55 tumor samples (see, FIG. 14) by real-time quantitative PCR (TAQMAN®) by 
use of a Perkin-Elmer Biosystems ABI PRISM® 7700 Sequence Detection System. The 
following abbreviations are used in FIG. 14: 

ca. = carcinoma, 
* = established from metastasis, 
30 met = metastasis, 

s cell var= small cell variant, 
non-s = non-sm =non-small, 
squam = squamous, 
pi. eff = pi effusion = pleural effusion, 
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glio = glioma, 

astro = astrocytoma, and 

neuro = neuroblastoma. 



Initially, 96 RNA samples were normalized to P-actin and GAPDH. RNA (-50 ng total 
5 or -1 ng poly(A)+) was converted tt> cDNA using the TAQMAN® Reverse Transcription 

Reagents Kit (PE Biosystems; Foster City, CA; Catalog No. N808-0234) and random hexamers 
according to the manufacturer's protocol. Reactions were performed in a 20 jxl total volume, 
and incubated for 30 minutes at 48°C. cDNA (5 p.1) was then transferred to a separate plate for 
the TAQMAN® reaction using (3-actin and GAPDH TAQMAN® Assay Reagents (PE 
10 Biosystems; Catalog Nos. 4310881E and 4310884E, respectively) and TAQMAN® Universal 
PCR Master Mix (PE Biosystems; Catalog No. 4304447) according to the manufacturer's 
protocol. Reactions were performed in a 25 jil total volume using the following parameters: 
2 minutes at 50°C; 10 minutes at 95°C; 15 seconds at 95°C/1 min. at 60°C (40 cycles total). 

Results were recorded as CT values (i.e., cycle at which a given sample crosses a 
15 threshold level of fluorescence) using a log scale, with the difference in RNA concentration 

between a given sample and the sample with the lowest CT value being represented as 2 . The 
percent relative expression is then obtained by taking the reciprocal of this RNA difference and 
multiplying by 100. The average CT values obtained for {3-actin and GAPDH were used to 
normalize RNA samples. The RNA sample generating the highest CT value required no further 
20 diluting, while all other samples were diluted relative to this sample according to their 0-actin 
/GAPDH average CT values. 

Normalized RNA (5 jil) was converted to cDNA and analyzed via TAQMAN® using One 
Step RT-PCR Master Mix Reagents (PE Biosystems; Catalog No. 4309169) and gene-specific 
primers according to the manufacturer's instructions. Probes and primers were designed for each 

25 assay according to Perkin Elmer Biosystem's Primer Express Software package (Version I for 
Apple Computer's Macintosh Power PC) using the sequence of the respective clones as input. 
Default settings were used for reaction conditions and the following parameters were set before 
selecting primers: primer concentration = 250 nM; primer melting temperature (Tm) range = 58°- 
60° C; primer optimal Tm = 59° C; maximum primer difference = 2° C, probe does not posses a 

30 5'-terminus G; probe T m must be 10° C greater than primer T m ; and amplicon size 75 bp to 100 
bp in length. The probes and primers were synthesized by Synthegen (Houston, TX). Probes 
were double-purified by HPLC to remove uncoupled dye and then evaluated by mass 
spectroscopy to verify coupling of reporter and quencher dyes to the 5'- and 3'-termini of the 
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probe, respectively. Their final concentrations used were - Forward and Reverse Primers = 900 
nM each; and probe = 200nM. 

Subsequent PCR conditions were as follows. Normalized RNA from each tissue and 
each cell line was spotted in each well of a 96 well PCR plate (Perkin Elmer Biosystems). PCR 
5 reaction mixes, including two probes (i.e., SECP-specific and another gene-specific probe 

multiplexed with the SEPC-specific probe) were set up using lx TaqMan™ PCR Master Mix for 
the PE Biosystems 7700, with 5 mM MgCl 2 ; dNTPs (dA, G, C, U at 1:1:1:2 ratios); 0.25 U/ml 
AmpliTaq Gold™ (PE Biosystems); 0.4 U/^l RNase inhibitor; and 0.25 U/jil Reverse 
Transcriptase. Reverse transcription was then performed at 48°C for 30 minutes, followed by 
10 amplification/PCR cycles as follows: 95°C 10 minuets, then 40 cycles of 95° C for 15 seconds, 
and 60°C for 1 minute. 

The primer-probe sets employed in the expression analysis of each clone, and a summary 
of the results, are provided below. The complete experimental results are illustrated in FIG. 14. 
The panel of cell lines employed was identical in all cases except that samples 95 and 96 were 
15 gDNA and a melanoma UACC-257 (control), respectively, in the experiments for clone 

11696905. The nucleotide sequences of the primer sets used for these clones are as follows: 

Clone 1 1696905.0.47 Primer Set: 

Ag 383 (F): 5 • -ggcctctccgtacccttctc-3 1 (SEQ ID NO:25) 

Ag 383(R): 5 • -agaggctcttggcgcagtt-3 • (SEQ ID NO:26) 

20 Ag 383 (P): tet-5 ■ -accaggatcacgacctccgcagg-3 • -tamra (SEQ ID NO:27) 

Primer Set Ag 383 was designed to probe for nucleotides 403-478 in SEPC 3 (clone 
1 1696905.0.47). The results indicate that the clone was prominently expressed in normal cells 
such as adipose, adrenal gland, various regions of the brain, skeletal muscle, bladder, liver and 
fetal liver, mammary gland, placenta, prostate and testis. It was also found to be expressed at 

25 levels much higher than comparable normal cells in cancers of the kidney and lung, and 

expressed at levels much lower than comparable normal cells in cancers of the central nervous 
system (CNS) and breast. These results suggest that SEPC 3 (clone 1 1696905.0.47), or 
fragments thereof, may be useful in probing for cancer in kidney and lung, and that the nucleic 
acid or the protein of clone 1 1696905.0.47 may be a target for therapeutic agents in such cancers. 

30 These nucleic acids and proteins may be useful as therapeutic agents in treating cancers of the 
CNS and breast. 

Clone 16406477.0.206 Primer Set: 

Ag 53 (F): 5 • -gcctggcacggactatgtgt-3 * (SEQ ID NO:28) 



Ag 53 (R): 5 • -GCCGTCAGCCTTGGAAAGT-3 • (SEQ ID NO:29) 

Ag 53 (P): TET-5 ' -CCATTCCCGCTGCACTGTGACG - 3 1 -TAMRA (SEQ ID NO:30) 

SEPC 7 (clone 16406477.0.206) was found to be expressed essentially exclusively in 
testis cells, with a low level of expression in the hypothalamus, among the cells tested. 

5 Clone 21433858 Primer Set: 

Ag 1 27 (F): 5 ' -CCTGCCAGGATGACTGTCAATT- 3 ' (SEQ ID NO:3 1 ) 

Ag 127 (R): 5 ' -TGGTCCTAACTGCACCACAGTCT-3 ' (SEQ ID NO:32) 

Ag 127 (P): TET-5 ' -CCAGCTGGTCCAAGTTTTCTTCATGCAA-3 ' -TAMRA (SEQ ID NO:33) 

Probe set Ag 127 targets nucleotides 2524-2601 of SECP1 (clone 21433858). The results 
10 show that the clone is expressed principally in normal tissues such as adipose, brain, bladder, 

fetal and adult kidney, mammary gland, myometrium, uterus, placenta, and testis. In comparison 
to normal lung tissue, it is highly expressed in a small cell lung cancer, a large cell lung cancer, 
and a non-small cell lung cancer. Therefore, SECP1 (clone 21433858), or a fragment thereof, 
may be useful as a diagnostic probe for such lung cancers. The nucleic acids or proteins of 
15 SECP1 (clone 21433858) may furthermore serve as targets for the treatment of cancer in these 
and other tissues. 

Clone 21637262.0.64 Primer Set: 

Ab5(F): 5 ' -GTGATCCTCAGGCTGGACCA- 3 ' (SEQ ID NO:34) 

Ab5(R): 5 ' -TTCTGACTGGGCTGCATCC-3 * (SEQ ID NO:35) 

20 Ab5(P): fam-5 • -ccagtgtttcctcagcacagggcc-3 1 -tamra (SEQ ID NO:36) 

Probe set Ab5 targets nucleotides 1221-1298 in SECP9 (clone 21637262.0.64). The 
results shown in FIG. 14 demonstrate that SECP9 (clone 21637262.0.64) is expressed in cells 
from normal tissues including, especially, the salivary gland and trachea, among those cells 
examined. 

25 Table ??. Probe and Primer Set: Ag 815 for CG106318_01 



Primers 

Forward 



Sequences 



5'-TGTGCTCAGCACATGGTCTA-3 ' 
FAM-5- 

Probe ACACCTGCTCAGGGAAAACGACAGAA- 
3 ' -TAMRA 

Reverse 5'-TCGTGCTCGTATCTGTTTCC - 3 ' 



TM 

59 

69.9 
58.9 



Length 

20 

26 
20 



Start Position 

1722 

1760 
1787 



SEQ ID 
NO 

37 
38 



39 
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Other Emb diments 

While the invention has been described in conjunction with the detailed description 
thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, 
5 which is defined by the scope of the appended claims. Other aspects, advantages, and 
modifications are within the scope of the following claims. 
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WHAT IS CLAIMED IS: 

25 1 . An isolated polypeptide comprising an amino acid sequence selected from the group 
consisting of: 

(a) a mature form of an amino acid sequence selected from the group 
consisting of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
57; 
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(b) a variant of a mature form of an amino acid sequence selected from the 
group consisting of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 
55 and 57 wherein one or more amino acid residues in said variant differs from the amino 
acid sequence of said mature form, provided that said variant differs in no more than 15% 
of the amino acid residues from the amino acid sequence of said mature form; 

(c) an amino acid sequence selected from the group consisting of SEQ ID 
NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57; and 

(d) a variant of an amino acid sequence selected from the group consisting of 
SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57 wherein 
one or more amino acid residues in said variant differs from the amino acid sequence of 
said mature form, provided that said variant differs in no more than 15% of amino acid 
residues from said amino acid sequence. 

The polypeptide of claim 1, wherein said polypeptide comprises the amino acid sequence 
of a naturally-occurring allelic variant of an amino acid sequence selected from the group 
consisting of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
57. 

The polypeptide of claim 2, wherein said allelic variant comprises an amino acid 
sequence that is the translation of a nucleic acid sequence differing by a single nucleotide 
from a nucleic acid sequence selected from the group consisting of SEQ ID NO:l, 3, 5, 7, 
9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. 

The polypeptide of claim 1, wherein the amino acid sequence of said variant comprises a 
conservative amino acid substitution. 

An isolated nucleic acid molecule comprising a nucleic acid sequence encoding a 
polypeptide comprising an amino acid sequence selected from the group consisting of: 

(a) a mature form of an amino acid sequence selected from the group consisting of 
SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57; 

(b) a variant of a mature form of an amino acid sequence selected from the group 
consisting of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
57 wherein one or more amino acid residues in said variant differs from the amino acid 
sequence of said mature form, provided that said variant differs in no more than 15% of 
the amino acid residues from the amino acid sequence of said mature form; 
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(c) an amino acid sequence selected from the group consisting of SEQ ID NO:2, 
4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57; 

(d) a variant of an amino acid sequence selected from the group consisting of SEQ 
ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57 wherein one or 
more amino acid residues in said variant differs from the amino acid sequence of said 
mature form, provided that said variant differs in no more than 15% of amino acid 
residues from said amino acid sequence; 

(e) a nucleic acid fragment encoding at least a portion of a polypeptide 
comprising an amino acid sequence chosen from the group consisting of SEQ ID NO:2, 
4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57 or a variant of said 
polypeptide, wherein one or more amino acid residues in said variant differs from the 
amino acid sequence of said mature form, provided that said variant differs in no more 
than 15% of amino acid residues from said amino acid sequence; and 

(f) a nucleic acid molecule comprising the complement of (a), (b), (c), (d) or (e). 

The nucleic acid molecule of claim 5, wherein the nucleic acid molecule comprises the 
nucleotide sequence of a naturally-occurring allelic nucleic acid variant. 

The nucleic acid molecule of claim 5, wherein the nucleic acid molecule encodes a 
polypeptide comprising the amino acid sequence of a naturally-occurring polypeptide 
variant. 

The nucleic acid molecule of claim 5, wherein the nucleic acid molecule differs by a 
single nucleotide from a nucleic acid sequence selected from the group consisting of SEQ 
ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. 

The nucleic acid molecule of claim 5, wherein said nucleic acid molecule comprises a 
nucleotide sequence selected from the group consisting of 

(a) a nucleotide sequence selected from the group consisting of SEQ ID NO:l, 3, 
5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56; 

(b) a nucleotide sequence differing by one or more nucleotides from a nucleotide 
sequence selected from the group consisting of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 
40, 42, 44, 46, 48, 50, 52, 54 and 56 provided that no more than 20% of the nucleotides 
differ from said nucleotide sequence; 

(c) a nucleic acid fragment of (a); and 

(d) a nucleic acid fragment of (b). 
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The nucleic acid molecule of claim 5, wherein said nucleic acid molecule hybridizes 
under stringent conditions to a nucleotide sequence chosen from the group consisting of 
SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 or a 
complement of said nucleotide sequence. 

The nucleic acid molecule of claim 5, wherein the nucleic acid molecule comprises a 
nucleotide sequence selected from the group consisting of 

(a) a first nucleotide sequence comprising a coding sequence differing by one or 
more nucleotide sequences from a coding sequence encoding said amino acid sequence, 
provided that no more than 20% of the nucleotides in the coding sequence in said first 
nucleotide sequence differ from said coding sequence; 

(b) an isolated second polynucleotide that is a complement of the first 
polynucleotide; and 

(c) a nucleic acid fragment of (a) or (b). 

A vector comprising the nucleic acid molecule of claim 11. 

The vector of claim 12, further comprising a promoter operably-linked to said nucleic 
acid molecule. 

A cell comprising the vector of claim 12. 

An antibody that immunospecifically-binds to the polypeptide of claim 1. 

The antibody of claim 15, wherein said antibody is a monoclonal antibody. 

The antibody of claim 15, wherein the antibody is a humanized antibody. 

A method for determining the presence or amount of the polypeptide of claim 1 in a 
sample, the method comprising: 

(a) providing the sample; 

(b) contacting the sample with an antibody that binds immunospecifically to 
the polypeptide; and 

(c) determining the presence or amount of antibody bound to said 
polypeptide, 

thereby determining the presence or amount of polypeptide in said sample. 
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A method for determining the presence or amount of the nucleic acid molecule of claim 5 
in a sample, the method comprising: 

(a) providing the sample; 

(b) contacting the sample with a probe that binds to said nucleic acid 
molecule; and 

(c) determining the presence or amount of the probe bound to said nucleic 
acid molecule, 

thereby determining the presence or amount of the nucleic acid molecule in said 

sample. 

A method of identifying an agent that binds to a polypeptide of claim 1, the method 
comprising: 

(a) contacting said polypeptide with said agent; and 

(b) determining whether said agent binds to said polypeptide. 

A method for identifying an agent that modulates the expression or activity of the 
polypeptide of claim 1, the method comprising: 

(a) providing a cell expressing said polypeptide; 

(b) contacting the cell with said agent; and 

(c) determining whether the agent modulates expression or activity of said 
polypeptide, 

whereby an alteration in expression or activity of said peptide indicates said agent 
modulates expression or activity of said polypeptide. 

A method for modulating the activity of the polypeptide of claim 1, the method 
comprising contacting a cell sample expressing the polypeptide of said claim with a 
compound that binds to said polypeptide in an amount sufficient to modulate the activity 
of the polypeptide. 
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A method of treating or preventing a SECP-associated disorder, said method comprising 
administering to a subject in which such treatment or prevention is desired the 
polypeptide of claim 1 in an amount sufficient to treat or prevent said SECP-associated 
disorder in said subject. 

The method of claim 23, wherein said subject is a human. 

A method of treating or preventing a SECP-associated disorder, said method comprising 
administering to a subject in which such treatment or prevention is desired the nucleic 
acid of claim 5 in an amount sufficient to treat or prevent said SECP-associated disorder 
in said subject. 

The method of claim 25, wherein said subject is a human. 

A method of treating or preventing a SECP-associated disorder, said method comprising 
administering to a subject in which such treatment or prevention is desired the antibody 
of claim 15 in an amount sufficient to treat or prevent said SECP-associated disorder in 
said subject. 

The method of claim 15, wherein the subject is a human. 

A pharmaceutical composition comprising the polypeptide of claim 1 and a 
pharmaceutically-acceptable carrier. 

A pharmaceutical composition comprising the nucleic acid molecule of claim 5 and a 
pharmaceutically-acceptable carrier. 

A pharmaceutical composition comprising the antibody of claim 15 and a 
pharmaceutically-acceptable carrier. 

A kit comprising in one or more containers, the pharmaceutical composition of claim 29. 
A kit comprising in one or more containers, the pharmaceutical composition of claim 30. 
A kit comprising in one or more containers, the pharmaceutical composition of claim 31. 
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35. The use of a therapeutic in the manufacture of a medicament for treating a syndrome 
associated with a human disease, the disease selected from a SECP-associated disorder, wherein 
said therapeutic is selected from the group consisting of a SECP polypeptide, a SECP nucleic 
acid, and a SECP antibody. 

5 36. A method for screening for a modulator of activity or of latency or predisposition to a 
SECP-associated disorder, said method comprising: 

(a) administering a test compound to a test animal at increased risk for a SECP- 
associated disorder, wherein said test animal recombinantly expresses the polypeptide of claim 1; 

(b) measuring the activity of said polypeptide in said test animal after administering 
10 the compound of step (a); 

(c) comparing the activity of said protein in said test animal with the activity of said 
polypeptide in a control animal not administered said polypeptide, wherein a change in the 
activity of said polypeptide in said test animal relative to said control animal indicates the test 
compound is a modulator of latency of or predisposition to a SECP-associated disorder. 

15 37. The method of claim 36, wherein said test animal is a recombinant test animal that 

expresses a test protein transgene or expresses said transgene under the control of a promoter at 
an increased level relative to a wild-type test animal, and wherein said promoter is not the native 
gene promoter of said transgene. 

38. A method for determining the presence of or predisposition to a disease associated with 
20 altered levels of the polypeptide of claim 1 in a first mammalian subject, the method comprising: 

(a) measuring the level of expression of the polypeptide in a sample from the first 
mammalian subject; and 

(b) comparing the amount of said polypeptide in the sample of step (a) to the amount 
of the polypeptide present in a control sample from a second mammalian subject known not to 

25 have, or not to be predisposed to, said disease, 

wherein an alteration in the expression level of the polypeptide in the first subject as compared to 
the control sample indicates the presence of or predisposition to said disease. 

39. A method for determining the presence of or predisposition to a disease associated with 
altered levels of the nucleic acid molecule of claim 5 in a first mammalian subject, the method 

30 comprising: 
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(a) measuring the amount of the nucleic acid in a sample from the first mammalian 
subject; and 

(b) comparing the amount of said nucleic acid in the sample of step (a) to the amount 
of the nucleic acid present in a control sample from a second mammalian subject known not to 

5 have or not be predisposed to, the disease; 

wherein an alteration in the level of the nucleic acid in the first subject as compared to the 
control sample indicates the presence of or predisposition to the disease. 

40. A method of treating a pathological state in a mammal, the method comprising 
administering to the mammal a polypeptide in an amount that is sufficient to alleviate the 

10 pathological state, wherein the polypeptide is a polypeptide having an amino acid sequence at 
least 95% identical to a polypeptide comprising an amino acid sequence of at least one of SEQ 
ID NO:2, 4, 6, 8, 10, 12, 14, 16, and 18, or a biologically active fragment thereof. 

41 . A method of treating a pathological state in a mammal, the method comprising 
administering to the mammal the antibody of claim 15 in an amount sufficient to alleviate the 

15 pathological state. 
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POLYPEPTIDES AND POLYNUCLEOTIDES ENCODING SAME 



ABSTRACT 

The invention provides polypeptides, designated herein as SECP polypeptides, as 
well as polynucleotides encoding SECP polypeptides, and antibodies that immunospecifically- 
bind to SECP polypeptide or polynucleotide, or derivatives, variants, mutants, or fragments 
thereof. The invention additionally provides methods in which the SECP polypeptide, 
polynucleotide, and antibody are used in the detection, prevention, and treatment of a broad 
range of pathological states. 
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GACAGAGTGCAGCCTTTTCAGACTCTGTGACACAGTTCCCCTTTT 
GCAAAAATACTTAGCGAGGATCATTACTTTCCAACAGTCGTGTCC 
AGAGACCTACTTTGTAACACCGCAGGGAAGTTAATGTACTAGGTC 
TTGAAAGGTCTTTCTGGAATGTGCAGTAACTTGTAGTTTTCTTCT 
AGTAGCACTGCTAATTTTTGTGTTATAATTTTTGTAGGTCCATGG 

GGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACG 



ArgAlaValTrpCy sAlaHi sValGluGlyTrpThrThrLeuHi s 

ACTAACTGTAAGCAGGCCGAGAGACCCAATAACCAGCAGAATTGT 
ThrAsnCysLysGlnAlaGluArgProAsnAsnGlnGlnAsnCys 

TTCAAAGTTTGCGATTGGCACAAAGAGTTGTACGACTGGAGACTG 
PheLysValCysAspTrpHisLysGluLeuTyrAspTrpArgLeu 

GGACCTTGGAATCAGTGTCAGCCCGTGATTTCAAAAAGCCTAGAG 
GlyProTrpAsnGlnCysGlnProVallleSerLysSerLeuGlu 

AAACCTCTTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGG 
LysProLeuGluCysIleLysGlyGluGluGlylleGlnValArg 

GAGATAGCGTGCATCCAGAAAGACAAAGACATTCCTGCGGAGGAT 
Glul 1 eAl aCy s I leGlnLy s AspLy sAspI 1 e ProAlaGluAsp 

ATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGGAGCAGGCT 
I lei leCysGluTyrPheGluProLysProLeuLeuGluGlnAla 

TGCCTC ATTC C TTGC C AGC AAG ATTGC ATC GTGTCTGAATTTTCT 
CysLeuI 1 eProCy sGlnGlnAspCy si 1 eVal SerGluPheSer 

GCCTGGTCCGAATGCTCCAAGACCTGCGGCAGCGGGCTCCAGCAC 
AlaTrpSerGluCysSerLysThrCysGlySerGlyLeuGlnHis 

CGGACGCGTCATGTGGTGGCGCCCCCGCAGTTCGGAGGCTCTGGC 
ArgThrArgHisValValAlaProProGlnPheGlyGlySerGly 

TGTCCAAACCTGACGGAGTTCCAGGTGTGCCAATCCAGTCCATGC 
CysProAsnLeuThrGluPheGlnValCysGlnSerSerProCys 

GAGGCCGAGGAGCTCAGGTACAGCCTGCATGTGGGGCCCTGGAGC 
GluAlaGluGluLeuArgTyrSerLeuHisValGlyProTrpSer 

ACCTGCTCAATGCCCCACTCCCGACAAGTAAGACAAGCAAGGAGA 
ThrCysSerMetProHisSerArgGlnValArgGlnAlaArgArg 
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856 CGC GGG AAGAAT AAAGAAC GGGAAAAGGACC GC AGC AAAGGAGTA 
ArgGlyLysAsnLysGluArgGluLysAspArgSerLysGlyVal 

901 AAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGAGAAACAGA 
LysAspProGluAlaArgGluLeuIleLysLysLysArgAsnArg 

946 AACAGGCAGAACAGACAAGAGAACAAATATTGGGACATCCAGATT 
AsnArgGlnAsnArgGlnGluAsnLysTyrTrpAspIleGlnlle 

991 GG AT ATCAGAC C AGAGAGGTT ATGTGC ATT AAC AAGAC GGGGAAA 
GlyTyrGlnThrArgGluValMetCysIleAsnLysThrGlyLys 

1036 GCTGCTGATTTAAGCTTTTGCCAGCAAGAGAAGCTTCCAATGACC 
AlaAlaAspLeuSerPheCysGlnGlnGluLysLeuProMetThr 

1081 TTCCAGTCCTGTGTGATCACCAAAGAGTGCCAGGTTTCCGAGTGG 
PheGlnSerCysVallleThrLysGluCysGlnValSerGluTrp 

1126 TCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCC 
SerGluTrpSerProCysSerLysThrCysHisAspMetValSer 

1171 CCTGCAGGCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCC 
ProAlaGlyThrArgValArgThrArgThrlleArgGlnPhePro 

1216 ATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAAGAAAAAGAACCC 
IleGlySerGluLysGluCysProGluPheGluGluLysGluPro 

1261 TGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGC 
CysLeuSerGlnGlyAspGlyValValProCysAlaThrTyrGly 

1306 TGGAGAACTAC AGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTC 
TrpArgThrThrGluTrpThrGluCysArgValAspProLeuLeu 

1351 AGTCAGCAGGACAAGAGGCGCGGCAACCAGACGGCCCTCTGTGGA 
SerGlnGlnAspLysArgArgGlyAsnGlnThrAlaLeuCysGly 

1396 GGGGGCATCCAGACCCGAGAGGTGTACTGCGTGCAGGCCAACGAA 
GlyGlylleGlnThrArgGluValTyrCysValGlnAlaAsnGlu 

1441 AACCTCCTCTCACAATTAAGTACCCACAAGAACAAAGAAGCCTCA 
AsnLeuLeuSerGlnLeuSerThrHisLysAsnLysGluAlaSer 

1486 AAGC C AATGGAC TT AAAATT ATGC AC TGG ACC T ATC CC T AAT AC T 
LysProMetAspLeuLysLeuCysThrGlyProIleProAsnThr 

1531 ACACAGCTGTGCCACATTCCTTGTCCAACTGAATGTGAAGTTTCA 
ThrGlnLeuCy sHi sll ePr oCy s Pr oThrGluCy sGluVal Ser 



Fig 1 (continued) 
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1157 6 CCTTGGTCAGCTTGGGGACCTTGTACTTATGAAAACTGTAATGAT 
ProTrpSerAlaTrpGlyProCysThrTyrGl\iAsnCysAsnAsp 

1621 C AGC AAGGGAAAAAAGGC TTC AAACTGAGG AAGCGGC GC ATT AC C 
GlnGlnGlyLysLysGlyPheLysLeuArgLysArgArglleThr 

1666 AATGAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCAC 
AsnGluProThrGlyGly SerGly ValThrGlyAsnCy s ProHi s 

1711 TTACTGGAAGCCATTCCCTGTGAAGAGCCTGCCTGTTATGACTGG 
LeuLeuGluAlalleProCysGluGluProAlaCysTyrAspTrp 

1756 AAAGC GGTGAG AC TGGG AG AC TGCG AGC C AG AT AAC GG AAAGGAG 
LysAlaValArgLeuGlyAspCysGluProAspAsnGlyLysGlu 

1801 TGTGGTCCAGGC ACGCAAGTTC AAGAGGTTGTGTGC ATC AACAGT 
Cy sGlyProGlyThrGlnValGlnGluVal ValCy s 1 1 e AsnSer 

1846 GATGGAGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTC 
AspGlyGluGluValAspArgGlnLeuCysArgAspAlallePhe 

1891 CCCATCCCTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTG 
ProIleProValAlaCysAspAlaProCysProLysAspCysVal 

1936 CTCAGCACATGGTCTACGTGGTCCTCCTGCTCACACACCTGCTCA 
LeuSerThrTrpSerThrTrpSerSezrCysSerHisThrCysSer 

1981 GGGAAAACGACAGAAGGGAAACAGATACGAGCACGATCCATTCTG 
GlyLysThrThrGluGlyLysGlnlleArgAlaArgSerlleLeu 

2026 GCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGT 
AlaTyrAlaGlyGluGluGlyGlylleArgCysProAsnSerSer 

2071 GCTTTGCAAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTG 
AlaLeuGlnGluValArgSerCysAsnGluHisProCysThrVal 

2116 TACCACTGGCAAACTGGTCCCTGGGGCCAGTGCATTGAGGACACC 
TyrHi sTrpGlnThrGlyProTxpGlyGlnCy s 1 1 eGluAspThr 

2 1 61 TCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCC 
SerValSerSerPheAsnThrThrThrThrTrpAsnGlyGluAla 

2206 TCCTGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGA 
SerCysSerValGlyMetGlnThxArgLysVallleCysValArg 

2251 GTCAATGTGGGCCAAGTGGGACCCAAAAAATGTCCTGAAAGCCTT 
ValAsnValGlyGlnValGlyProLysIiysCysProGluSerLeu 
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2296 CGACCTGAAACTGTAAGGCCTTGTCTGCTTCCTTGTAAGAAGGAC 
ArgProGluThrValArgProCysLeuLeuProCysLysLysAsp 

2341 TGTATTGTGACCCCATATAGTGACTGGACATCATGCCCCTCTTCG 
CysIleValThrProTyrSerAspTrpThrSerCysProSerSer 

2386 TGTAAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGG 
CysLysGluGlyAspSerSerlleArgLysGlnSerArgHisArg 

2431 GTCATCATTCAGCTGCCAGCCAACGGGGGCCGAGACTGCACAGAT 
Val I lei leGlnLeuProAlaAsnGlyGlyArgAspCy sThr Asp 

2476 CCCCTCTATGAAGAGAAGGCCTGTGAGGCACCTCAAGCGTGCCAA 
ProLeuTyrGluGluLysAlaCysGluAlaProGlnAlaCysGln 

2521 AGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCCAATTAGTC 
SerTyrArgTrpLysThrHisLysTrpArgArgCysGlnLeuVal 

2566 CCTTGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGT 
ProTrpSerValGlnGlnAspSerProGlyAlaGlnGluGlyCys 

2611 GGGCCTGGGCGACAGGCAAGAGCCATTACTTGTCGCAAGCAAGAT 
GlyProGlyArgGlnAlaArgAlalleThrCysArgliysGlnAsp 

2656 GGAGGACAGGCTGGAATCC ATGAGTGCCTACAGTATGCAGGCCCT 
GlyGlyGlnAl aGly 1 1 eHi sGluCy sLeuGlnTyr Al aGly Pro 

2701 GTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGAC 
ValProAlaLeuThrGlnAlaCysGlnlleProCysGlnAspAsp 

2746 TGTCAATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGAC 
CysGlixLeuThrSerTirpSerLysPheSerSerCysAsnGlyAsp 

2791 TGTGGTGCAGTTAGGACCAGAAAGCGCACTCTTGTTGGAAAAAGT 
CysGlyAlaValArgThrArgLysArgTlirI*euValGlyLysSer 

2836 AAAAAGAAGGAAAAATGTAAAAATTCCCATTTGTATCCCCTGATT 
LysLysLysGluLysCysLysAsnSerHisLeuTyrProLeuIle 

2881 GAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTG 
GluThrGlnTyrCysProCysAspLysTyrAsnAlaGlnProVal 

2926 GGGAACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTG 
GlyAsnTrpSerAspCysIleLeuProGluGlyLysValGluVal 

2971 TTGCTGGGAATGAAAGTACAAGGAGACATCAAGGAATGCGGACAA 
LeuLeuGlyMetLysValGlnGlyAspIleLysGluCysGlyGln 
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GG AT ATC GTTAC C AAGC AATGGC ATGC T ACGATC AAAATGGCAGG 
GlyTyrArgTyrGlnAlaMetAlaCysTyrAspGlnAsnGlyArg 

CTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAG 
LeuValGluThrSer ArgCysAsnSerHi sGlyTyr I leGluGlu 

GCCTGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGG 
Al aCy s I 1 e I 1 e ProCys ProSer AspCy sLysLeuSerGluTrp 

TC C AACTGGTC GCGC TGC AGC AAGTC C TGTGGGAGTGGTGTGAAG 
SerAsnTrpSerArgCysSerLysSerCysGlySerGlyValLys 

GTTC GTTCT AAATGGC TGC GTGAAAAAC C AT ATAATGGAGG AAGG 
ValArgSerLysTrpLeuArgGluLysProTyrAsnGlyGlyArg 

CCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGGTGTATGAG 
ProCysProLysLeuAspHisValAsnGlnAlaGlnValTyrGlu 

GTTGTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACA 
Va lVal Pr oCy sHi s Ser AspCy s AsnGlnTyrLeuTirpValThr 

GAGCCCTGGAGCATCTGCAAGGTGACCTTTGTGAATATGCGGGAG 
GluProTrpSerlleCysLysValThrPheValAsnMetArgGlu 

AAC TGTGG AGAGGGC GTGC AAACC C GAAAAGTGAGATGC ATGC AG 
AsnCysGlyGluGlyValGlnThrArgLysValArgCysMetGln 

AATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGT 
AsnThr Al aAspGlyProSerGluHi sValGluAspTyrLeuCy s 

GAC C C AGAAGAGATGC C C C TGGGC TCT AGAGTGTGC AAATT AC CA 
AspProGluGluMetProLeuGlySerArgValCysLysLeuPro 

TGC CC TGAGGAC TGTGTGATATCTGAATGGGGTC C ATGGACCC AA 
CysProGluAspCysVallleSerGluTrpGlyProTrpThrGln 

TGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGGCAAAGGTCAGCT 
CysValLeuProCysAsnGlnSerSerPheArgGlnArgSerAla 

GATCCCATCAGACAACCAGCTGATGAAGGAAGATCTTGCCCTAAT 
AspProIleArgGlnProAlaAspGluGlyArgSerCysProAsn 

GC TGTTGAGAAAGAACCC TGT AAC C TG AAC AAAAAC TGC T ACC AC 
AlaValGluLysGluProCysAsnLeuAsnLysAsnCysTyrHis 

TATGATTAT AATGT AAC AGAC TGGAGT AC ATGTC AGC TGAGTGAG 
TyrAspTyrAsnValThrAspTrpSerThrCysGlnLeuSerGlu 



Fig 1 (continued) 



6/35 

3736 AAGGC AGTTTGTGGAAAT6GAAT AAAAAC AAGGATGTTGGATTGT 
LysAlaValCysGlyAsnGlylleLysThrArgMetLeuAspCys 

3781 GTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCG 
ValArgSerAspGlyLysSerValAspLeuLysTyrCysGluAla 

3826 CTTGGCTTGGAGAAGAACTGGCAGATGAAC ACGTCCTGCATGGTG 
LeuGlyLeuGluLysAsnTrpGlnMetAsnThrSerCysMetVal 

3871 GAATGCCCTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCA 
GluCysProValAsnCysGlnLeuSerAspTrpSerProTrpSer 

3916 GAATGTTCTC AAACATGTGGCCTC AC AGGAAAAATGATCCGAAGA 
GluCysSerGlnThrCysGlyLeuThrGlyLysMetlleArgArg 

3961 CGAACAGTGACCC AGCCCTTTCAAGGTGATGGAAGACCATGCCCT 
ArgThrValThrGlnProPheGlnGlyAspGlyArgProCysPro 

4006 TCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTAT 
SerLeuMetAspGlnSerLysProCysProValLysProCysTyr 

4051 CGGTGGCAATATGGCCAGTGGTCTCC ATGCCAAGTGCAGGAGGCC 
ArgTrpGlnTyrGlyGlnTrpSerProCysGlnValGlnGluAla 

4096 CAGTGTGGAGAAGGGACCA6AACAAGGAACATTTCTTGTGTAGTA 
GlnCysGlyGluGlyThrArgThrArgAsnlleSerCysValVal 

4141 AGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAA 
SerAspGlySerAlaAspAspPheSerLysValValAspGluGlu 

4186 TTC TGTGC TG AC ATTGAACTC ATT AT AGATGGT AAT AAAAAT ATG 
PheCysAlaAspIleGluLeuIlelleAspGlyAsnLysAsnMet 

4231 GTTCTGGAGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTAT 
ValLeuGluGluSerCysSerGlnProCysProGlyAspCysTyr 

4276 TTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAGCTGACCTGTGTG 
LeuLysAspTrpSerSerTrpSerLeuCysGlnLeuThrCysVal 

4321 AATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGA 
AsnGlyGluAspLeuGlyPheGlyGlylleGlnValArgSerArg 

4366 CCGGTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAG 
ProValIleIleGlnGluLeuGl\iAsnGlnHisLeuCysProGlu 

4411 CAGATGTTAGAAACAAAATCATGTTATGATGGACAGTGCTATGAA 
GlnMetLeuGluThrLysSerCysTyrAspGlyGlnCysTyrGlu 

4456 TATAAATGGATGGCCAGTGCTTGGAAGGGCTCTTCCCGAACAGTG 
TyrLysTrpMetAlaSerAlaTrpLysGlySerSerArgThrVal 
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4501 TGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGGGCTGCTTG 
TrpCysGlnArgSerAspGlylleAsnValThrGlyGlyCysLeu 

4546 GTGATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGT 
ValMetSerGlnProAspAlaAspArgSerCysAsnProProCys 

4591 AGTCAACCCCACTCGTACTGTAGCGAGACAAAAACATGCCATTGT 
SerGlnProHisSerTyrCysSerGluThrLysThrCysHisCys 

4636 GAAGAAGGGTACACTGAAGTCATGTCTTCTAACAGCACCCTTGAG 
GluGluGlyTyrThrGluValMetSerSerAsnSerThrLeuGlu 

4681 CAATGC ACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGAC 
GlnCysThrLeuIleProValValValLeuProThrMetGluAsp 

4726 AAAAGAGGAGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAA 
LysArgGlyAspValLysThrSerArgAlaValHisProThrGln 

4771 CCCTCCAGTAACCCAGCAGGACGGGGAAGGACCTGGTTTCTACAG 
ProSerSerAsnProAlaGlyArgGlyArgThrTrpPheLeuGln 

4816 CCATTTGGGCCAGATGGGAGACTAAAGACCTGGGTTTACGGTGTA 
ProPheGlyProAspGlyArgLeuLysThrTrpValTyrGlyVal 

4861 GCAGCTGGGGCATTTGTGTTACTC ATCTTTATTGTCTCCATGATT 
AlaAlaGlyAlaPheValLeuLeuIlePhelleValSerMetlle 

4906 TATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAAC 
TyrLeuAlaCysLysLysProLysLysProGlnArgArgGlnAsn 

4951 AACCGACTGAAACCTTTAACCTTAGCCTATGATGGAGATGCCGAC 
AsnArgLeuLysProLeuThrLeuAlaTyrAspGlyAspAlaAsp 

4996 ATGTAACATATAACTTTTCCTGGCAACAACCAGTTTCGGCTTTCT 
Met 
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5041 GAC TTC AT AG ATGTC C AGAGGC C AC AAC AAATGT ATCC AAACTGT 

5086 GTGGATTAAAATATATTTTAATTTTTAAAAATGGCATCATAAAGA 

5131 C AAGAGTGAAAATC ATACTGCC ACTGGAGATATTTAAGACAGTAC 

5176 CACTTATATACAGACCATCAACCGTGAGAATTATAGGAGATTTAG 

5221 CTGAATAC ATGCTGC ATTCTGAAAGTTTTATGTCATCTTTTCTGA 

5266 AATCTACCGACTGAAAAACCACTTTCATCTCTAAAAAATAATGGT 

5311 GGAATTGGCC AGTTAGGATGCCTGATACAAGACCGTCTGCAGTGT 

5356 TAATCCATAAAACTTCCTAGCATGAAGAGTTTCTACCAAGATCTC 

5401 CAC AATACTATGGTCAAATTAAC ATGTGTACTCAGTTGAATGACA 

5446 CACATTATGTCAGATTATGTACTTGCTAATAAGCAATTTTAACAA 

5491 TGCATAACAAATAAACTCTAAGCTAAGCAGAAAATCCACTGAATA 

5536 AATTCAGCATCTTGGTGGTCGATGGTAGATTTTATTGACCTGCAT 

5581 TTCAGAGAC AAAGCCTCTTTTTTAAGACTTCTTGTCTCTCTCCAA 

5626 AGTAAG AATGCTGGAC AAGT AC T AGTGTC TTAGAAGAAC GAGTCC 

5671 TCAAGTTCAGTATTTTATAGTGGTAATTGTCTGGAAAACTAATTT 

5716 ACTTGTGTTAATACAATACGTTTCTACTTTCCCTGATTTTCAAAC 

5761 TGGTTGCCTGCATCTTTTTTGCTATATGGAAGGCACATTTTTGCA 

5806 CTATATTAGTGCAGCACGATAGGCGCTTAACCAGTATTGCCATAG 

5851 AAACTGCCTCTTTTCATGTGGGATGAAGACATCTGTGCCAAGAGT 

5896 GGC ATGAAGAC ATTTGCAAGTTCTTGTATCCTGAAGAGAGTAAAG 

5941 TTCAGTTTGGATGGC AGCAAGATGAAATC AGCTATTACACCTGCT 

5986 GTAC AC ACACTTCCTC ATC ACTGCAGCCATTGTGAAATTGACAAC 

6031 ATGGCGGTAATTTAAGTGTTGAAGTCCCTAACCCCTTAACCCTCT 

6076 AAAAGGTGGATTCCTCTAGTTGGTTTGTAATTGTTCTTTGAAGGC 

6121 TGTTTATGACTAGATTTTTATATTTGTTATCTTTGTTAAGAAAAA 

6166 AAAAAGAAAAAGGAACTGGATGTCTTTTTAATTTTGAGCAGATGG 

6211 AGAAAATAAATAATGTATCAATGACCTTTGTAACTAAAGGAAAAA 

6256 AAAAAAAAATGTGGATTTTCCTTTCTCTCTGATTTCCCAGTTTCA 

6301 GATTGAATGTCTGTCTTGCAGGCAGTTATTTCAAAATCCATAGTC 

6346 TTTNGCCTTTCTCACTGGCAAAATTTGA 
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CACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAG 
C CTC CTGCTGGGC C AC TGGCTGGGATC AGGAC ACC AGTGATGGTA 
AGTGCTGGCCCAGACTGAAGCTCGGAGAGGCACTCTGCTTGCCCA 
GCGTCACAGTCTTAGCTCCCAACTGTCCTGGCTTCCAGTCTCCCT 
TGCTTCCCAGATCCCAGACTCTAGCCCCAGCCCCGTCTCTTTCAC 
CAGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTC 
GCCCCACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACC 
TGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGC 
CTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGGAGAAGAGAAGG 
AGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCAC 
AGGACTGGGGGAAAGAGCTGCAATCAGAGGGTGTCTGCCATAGCT 
GGGCTCAGGCATCTGTCCTTGGCTTTGTTGCCTGGCTCCAGGGAG 
ATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGG 
TTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGG 
ACGCTCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGC 
TGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAGAGCCCAG 

AGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGAT 
MetSerAspGluAspSerCysValAlaCysGlyS 

CCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGC 
erLeuArgThrAlaGlyProGlnAlaGlyAlaProSerProTrpP 

C CTGGGAGGC C AGGC TGATGC AC C AGGGAC AGC TGGCCTGTGGCG 
roTrpGluAl aArgLeuMe tHi sGlnGlyGlnLeuAlaCysGlyG 

GAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCT 
lyAlaLeuValSerGluGluAlaValLeuThrAlaAlaHisCysP 

TCAATGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGA 
heAsnGlyArgGlnAlaProGluGluTrpSerValGlyLeuGlyT 

CCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAG 
hrArgProGluGluTrpGlyLeuLysGlnLeuIleLeuHisGlyA 

CCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGC 
laTyrThrHisProGluGlyGlyTyrAspMetAlaLeuLeuLeuL 

TGGCTCAGCCTGTGACACTGGGAGCCAGCCTGCGGGCCCTCTGCC 
euAlaGlnProValThrLeuGlyAlaSerLeuArgAlaLeuCysL 

TGCCCTATTTTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGG 
euProTyrPheAspHi sHi sLeuProAspGlyGluArgGlyTrpV 

TTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGA 
alLeuGlyArgAlaArgProGlyAlaGlylleSerSerLeuGlnT 

CAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGC 
hrValProValThrLeuLeuGlyProArgAlaCysSerArgLeuH 
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1216 ATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCGGGGATGG 
isAlaAlaProGlyGlyAspGlySerProIleLeuProGlyMetV 

1261 TGTGTACC AGTGC TGTGGGTGAGC TGCC C AGC TGTGAGGGC CTGT 
alCysThrSerAlaValGlyGluLeuProSerCysGluGlyLeuS 

1306 CTGGGGCACCACTGGTGCATGAGGTGAGGGGCACATGGTTCCTGG 
erGlyAlaProLeuValHisGluValArgGlyThrTrpPheLeuA 

1351 CCGGGCTGCACAGCTTCGGAGATGCTTGCCAAGGCCCCGCCAGGC 
laGlyLeuHisSerPheGlyAspAlaCysGlnGlyProAlaArgP 

1396 CGGCGGTCTTC ACCGCGCTCCCTGCCTATGAGGACTGGGTCAGCA 
roAlaValPheThrAlaLeuProAlaTyrGluAspTrpValSerS 

1441 GTTTGGACTGGCAGGTCTACTTCGCCGAGGAACCAGAGCCCGAGG 
erLeuAspTrpGlnValTyrPheAlaGluGluProGluProGluA 

1486 CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCT 
laGluProGlySerCysLeuAlaAsnlleSerGlnProThrSerC 

1531 GCTGACAGGGGACCTGGCCATTCTCAGGACAAGAGAATGCAGGCA 

ys 

1576 GGCAAATGGCATTACTGCCCCTGTCCTCCCCACCCTGTCATGTGT 

1621 GATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGGA 

1666 AGGAACCTGCCTGGGGCCACAGGTGCCCCCTCCCCACCCTGCAGG 

1711 ACAGGGGTGTCTGTGGACACTCCCACACCGAACTCTGCTACCAAG 

1756 CAGGCGTCTCAGCTTTCCTCCTCCTTTACCCTTTCAGATACAATC 

1801 ACGCCAGCCCCGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGC 

1846 AGTTTTCCTTTTTTTAAACTTAAATAAATTGTTACAAAATAGACT 

1891 TTAG 
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1 GCGGATCCTCACAC6ACTGTGATCCGATTCTTTCCAGCGGCTTCT 
4 6 6CAACCAAGCGGGTCTTACCCCCGGTCCTCCGCGTCTCCAGTCCT 
9 1 CGCACCTGGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCT 

13 6 CCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 

MetSerGlyAlaProThrAlaGlyAla 

181 GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAG 
AlaLexiMetLeuCysAlaAlaThrAlaValLeuLeuSerAlaGln 

226 GGCGGACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGAC 
GlyGlyProValGlnSerLysSerProArgPheAlaSerTrpAsp 

271 GAGATGAATGTCCTGGCGCACGGACTCCTGCAGCTCGGCCAGGGG 
GluMetAsnValLeuAlaHisGlyLeuLeuGlnLeuGlyGlnGly 

316 TGCGCGAACACCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGA 
CysAlaAsnThrGlyAlaHisProGlnSerAlaGluArgAlaGly 

361 GCGCGCCTGAGCGCGTGCGGGTCCGCCTGTC AGGGAACCGAGGGG 
AlaArgLeuSerAlaCysGlySerAlaCysGlnGlyThrGluGly 

406 TCCACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAG 
SerThrAspLeuProLeuAlaProGluSerArgValAspProGlu 

451 GTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGCAGG 
ValLeuHisSerLeuGlnThrGlnLeuLysAlaGlnAsnSerArg 

496 ATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTG 
IleGlnGlnLeuPheHisLysValAlaGlnGlnGlnArgHisLeu 

541 GAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGC 
GluLy sGlnHi s LeuArgI 1 eGlnHi sLeuGlnSerGlnPheGly 

586 CTCCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCC 
LeuLeuAspHisLysHisLeuAspHisGluValAlaLysProAla 

631 CGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCT 
ArgArgLysArgLeuProGluMetAlaGlnProValAspProAla 

676 CACAATGTCAGCCGCCTGCACCGGCTGCCCAGGGATTGCCAGGAG 
HisAsnValSerArglieuHisArgLeuProArgAspCysGlnGlu 

721 C TGTTC C AGGTTGGGGAGAGGC AG AGTGGAC T ATTTGAAATC C AG 
LeuPheGlnValGlyGluArgGlnSerGlyLeuPheGluIleGln 

766 CCTCAGGGGTCTCCGCCATTTTTGGTGAACTGCAAGATGACCTCA 
ProGlnGlySerProProPheLeuValAsnCysLysMetThrSer 
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811 GATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTG 
AspGlyGlyTrpThrVal 1 1 eGlnArgArgHi s AspGlySerVal 

856 GACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGAT 
AspPheAsnArgProTrpGluAlaTyrLysAlaGlyPheGlyAsp 

901 CCCCACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATG 
ProHisGlyGluPheTrpLeuGlyLeuGluLysValHisSerMet 

946 ATGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCGGGACTGG 
MetGlyAspArgAsnSerArgLeuAlaValGlnLeuArgAspTrp 

991 GATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGC 
AspGlyAsnAlaGluLeuLeuGlnPheSerValHisLeuGlyGly 

1036 GAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGC 
GluAspThrAlaTyrSerLeuGlnLeuThrAlaProValAlaGly 

1081 CAGCTGGGCGCC ACCACCGTCCC ACCCAGCGGCCTCTCCGTACCC 
GlnLeuGlyAlaThrThrValProProSerGlyLeuSerValPro 

1126 TTC TC C AC TTGGGAC C AGGATC ACGAC CTC C GC AGGGAC AAGAAC 
PheSerThrTrpAspGlnAspHisAspLeuArgArgAspLysAsn 

1171 TGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGC 
CysAlaLysSerLeuSerGlyGlyTrpTrpPheGlyThrCysSer 

1216 CATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAG 
HisSerAsnLeuAsnGlyGlnTyrPheArgSerlleProGlnGln 

1261 CGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGC 
ArgGlnLy sLeuLy sLy sGly 1 1 ePheTrpLy sThrTrpArgGly 

1306 CGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATG 
ArgTyrTyrProLeuGlnAlaThrThrMetLeuIleGlnProMet 

1351 GCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAG 
AlaAlaGluAlaAlaSer 



1396 GCCCACGAAAGACGGTGACTCTTGGCTCTGCCCGAGGATGTGGCC 

1441 GTTCCCTGCCTGGGCAGGGGCTCCAAGGAGGGGCCATCTGGAAAC 

1486 TTGTGGACAGAGAAGAAGACCACGACTGGAGAAGCCCCCTTTCTG 

1531 AGTGCAGGGGGGCTGCATGCGTTGCCTCCTGAGATCGAGGCTGCA 

1576 GGATATGCTCAGACTCTAGAGGCGTGGACCAAGGGGCATGGAGCT 

1621 TCACTCCTTGCTGGCCAGGGAGTTGGGGACTCAGAGGGACCACTT 

1666 GGGGCCAGCCAGACTGGCCTCAATGGCGGACTCAGTCACATTGAC 

1711 TGACGGGGACCAGGGCTTGTGTGGGTCGAGAGCGCCCTCATGGTG 

1756 CTGGTGCTGTTGTGTGTAGGTCCCCTGGGGACACAAGCAGGCGCC 

1801 AATGGTATCTGGGCGGAGCTCACAGAGTTCTTGGAATAAAAGCAA 

1846 CCTC AGAACA 
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1 GGTAGCCGACGCGCCGGCCGGC6CGTGACCTTGCCCCTCTTGCTC 

16 GCCTTGAAAATGGAAAAGATGCTCGCAGGCTGCTTTCTGCTGATC 
MetGluLysMetLeuAlaGlyCysPheLeuLeuIle 

9 1 CTCGGACAGATCGTCCTCCTCCCTGCCGAGGCCAGGGAGCGGTCA 
LeuGlyGlnlleValLeuLeuProAlaGluAlaArgGluArgSer 

136 CGTGGGAGGTCCATCTCTAGGGGCAGACACGCTCGGACCCACCCG 
ArgGlyArgSerlleSerArgGlyArgHisAlaArgThrHisPro 

181 CAGACGGCCCTTCTGGAGAGTTCCTGTGAGAACAAGCGGGCAGAC 
GlnThrAlaLeuLeuGluSerSerCysGluAsnLysArgAlaAsp 

226 CTGGTTTTCATCATTGACAGCTCTCGCAGTGTCAACACCCATGAC 
LeuValPhe 1 1 e 1 1 eAspSer Ser ArgSerVal AsnThrHi s Asp 

271 T ATGC AAAGGTC AAGGAGTTC ATC GTGGAC ATC TTGC AATTC TTG 
Tyr AlaLy sValLy sGluPhe 1 1 eVal Aspl 1 eLeuGlnPheLeu 

3 16 GACATTGGTCCTGATGTCACCCGAGTGGGCCTGCTCCAATATGGC 
AspIleGlyProAspValThrArgValGlyLeuLeuGlnTyrGly 

361 AGCACTGTCAAGAATGAGTTCTCCCTC AAGACCTTCAAGAGGAAG 
SerThrVa 1 Ly s AsnGluPheSerLeuLy sThr PheLy s ArgLy s 

406 TCCGAGGTGGAGCGTGCTGTCAAGAGGATGCGGCATCTGTCCACG 
SerGluValGluArgAlaValLysArgMetArgHisLeuSerThr 

451 GGCACCATGACTGGGCTGGCCATCCAGTATGCCCTGAACATCGCA 
GlyThrMetThiGlylfeuAlalleGlnTyrAlaliexLAsnlleAla 

496 TTCTCAGAAGCAGAGGGGGCCCGGCCCCTGAGGGAGAATGTGCCA 
PheSerGluAlaGluGlyAlaArgProLeuArgGluAsnValPro 

541 CGGGTCATAATGATCGTGACGGATGGGAGACCTCAGGACTCCGTG 
ArgVal 1 1 eMetl 1 eValThr AspGlyArgPr oGlnAspSerVal 

586 GCCGAGGTGGCTGCTAAGGCACGGGACACGGGCATCCTAATCTTT 
AlaGluValAlaAlaLysAlaArgAspThrGlylleLeuIlePhe 

631 GCCATTGGTGTGGGCCAGGTAGACTTCAACACCTTGAAGTCCATT 
AlalleGlyValGlyGlnValAspPheAsnThrLeuLysSerlle 

676 GGGAGTGAGCC C C ATGAGGAC C ATGTC TTC CTTGTGGCC AATTTC 
GlySerGluProHi sGluAspHi sValPheLeuVal AlaAsnPhe 

721 AGCCAGATTGAGACGCTGACCTCCGTGTTCCAGAAGAAGTTGTGC 
SerGlnl 1 eGluThrLeuThr SerVal PheGlnLy sLy sLeuCy s 
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766 ACGGCCCACATGTGCAGCACCCT6GAGCATAACTGTGCCCACTTC 
Thr Al aHi sMe t Cy s SerThr LeuGluHi s AsnCy s Al aHi s Phe 

811 TGCATCAACATCCCTGGCTC ATACGTCTGC AGGTGCAAACAAGGC 
CysIleAsnlleProGlySerTyrValCysArgCysLysGlnGly 

856 TACATTCTCAACTCGGATCAGACGACTTGC AGAATCCAGGATCTG 
Tyr 1 1 eLeuAsnSer AspGlnThrThrCy s ArgI 1 eGlnAspLeu 

901 TGTGCCATGGAGGACCACAACTGTGAGCAGCTCTGTGTGAATGTG 
CysAlaMetGluAspHisAsnCysGluGlnLeuCysValAsnVal 

946 CCGGGCTCCTTCGTCTGCGAGTGCTAC AGTGGCTACGCCCTGGCT 
ProGlySerPheValCysGluCysTyrSerGlyTyrAlaLeuAla 

991 GAGGATGGGAAGAGGTGTGTGGCTGTGGACTACTGTGCCTCAGAA 
GluAspGlyLysArgCysValAlaValAspTyrCysAlaSerGlu 

1036 AACCACGGATGTGAACATGAGTGTGTAAATGCTGATGGCTCCTAC 
AsnHi sGlyCy sGluHi sGluCy sVal AsnAlaAspGly SerTyr 

1081 CTTTGCCAGTGCCATGAAGGATTTGCTCTTAACCCAGATGAAAAA 
LeuCy sGlnCy sHi sGluGlyPheAlaLeuAsnProAspGluLy s 

1126 AC GTGC AC AAAGAT AGAC T ACTGTGC C TC ATC T AATC ATGGATGT 
ThrCysThrLysIleAspTyrCysAlaSerSerAsnHisGlyCys 

1 17 1 CAGTACGAGTGTGTTAACACAGATGATTCCTATTCCTGCCACTGC 
GlnTyrGluCysValAsnThrAspAspSerTyrSerCysHisCys 

1216 CTGAAAGGCTTTACCCTGAATCCAGATAAGAAAACCTGCAGAAGG 
LeuLysGlyPheThrLeuAsnProAspLysLysThrCysArgArg 

1261 ATCAACTACTGTGCACTGAACAAACCGGGCTGTGAGCATGAGTGC 
IleAsnTyrCysAlaLeiiAsnLysProGlyCysGluHisGluCys 

1306 GTCAACATGGAGGAGAGCTACTACTGCCGCTGCCACCGTGGCTAC 
ValAsnMe tGluGluSerTyrTyrCysArgCysHi sArgGlyTyr 

1351 ACTCTGGACCCCAATGGCAAACCCTGCAGCCGAGTGGACCACTGT 
ThrLeuAspProAsnGlyLysProCysSerArgValAspHisCys 

1396 GCACAGCAGGACCATGGCTGTGAGCAGCTGTGTCTGAACACGGAG 
AlaGlnGlnAspHisGlyCysGluGlnLeuCysLeuAsnThrGlu 

144 1 GATTCCTTCGTCTGCCAGTGCTCAGAAGGCTTCCTCATCAACGAG 
AspSer PheValCy sGlnCy s SerGluGly PheLeuI 1 eAsnGlu 
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GACCTCAAGACCTGCTCCCGGGTGGATTACTGCCTGCTGA6TGAC 
AspLeuLysThrCysSerArgValAspTyrCysLeuLeuSerAsp 

CATGGTTGTGAATACTCCTGTGTCAACATGGACAGATCCTTTGCC 
HisGlyCysGluTyrSerCysValAsnMetAspArgSerPheAla 

TGTCAGTGTCCTGAGGGACACGTGCTCCGCAGCGATGGGAAGACG 
CysGlnCysProGluGlyHisValLeuArgSerAspGlyLysThr 

TGTGCAAAATTGGACTCTTGTGCTCTGGGGGACCACGGTTGTGAA 
CysAlaLysLeuAspSerCysAlaLeuGlyAspHisGlyCysGlu 

CATTCGTGTGTAAGCAGTGAAGATTCGTTTGTGTGCCAGTGCTTT 
HisSerCysValSerSerGluAspSerPheValCysGlnCysPhe 

GAAGGTT ATATAC TC C GTGAAGATGGAAAAAC C TGC AGAAGG AAA 
GluGlyTyrlleLeuArgGluAspGlyLysThrCysArgArgLys 

GATGTCTGCCAAGCTATAGACCATGGCTGTGAACACATTTGTGTG 
AspValCysGlnAlalleAspHisGlyCysGluHisIleCysVal 

AACAGTGACGACTCATACACGTGCGAGTGCTTGGAGGGATTCCGG 
AsnSerAspAspSerTyrThrCysGluCysLeuGluGlyPheArg 

CTCACTGAGGATGGGAAACGCTGCCGAATTTCCTCAGGGAAGGAT 
LeuThrGluAspGlyLysArgCysArglleSerSerGlyLysAsp 

GTCTGCAAATCAACCCACCATGGCTGCGAACACATTTGTGTTAAT 
ValCysLy sSerThrHi sHi sGlyCy sGluHi si leCy sValAsn 

AATGGGAATTCCTACATCTGCAAATGCTCAGAGGGATTTGTTCTA 
AsnGlyAsnSerTyrlleCysLysCysSerGluGlyPheValLeu 

GC TGAGG AC GGAAGAC GGTGC AAGAAATGC AC TG AAGGC C C AATT 
AlaGluAspGlyArgArgCysLysLysCysThxrGluGlyProIle 

GACCTGGTC TTTGTGATC G ATGGATC C AAG AGTC TTGGAGAAGAG 
AspLeuValPheVallleAspGlySerLysSerLeuGlyGluGlu 

AATTTTGAGGTCGTGAAGCAGTTTGTCACTGGAATTATAGATTCC 
As nPheGluVal Va 1 Ly s GlnPheVa 1 ThrGly 1 1 e 1 1 eAsp Ser 

TTGACAATTTCCCCCAAAGCCGCTCGAGTGGGGCTGCTCCAGTAT 
LeuThrlleSerProLysAlaAlaArgValGlyLeuLeuGlnTyr 

TCCACACAGGTCCACACAGAGTTCACTCTGAGAAACTTCAACTCA 
SerThrGlnValHisThrGluPheThrLeuArgAsnPheAsnSer 

GC C AAAG AC ATG AAAAAAGC C GTGGC C C AC ATGAAAT AC ATGGGA 
AlaLysAspMetLysLysAlaValAlaHisMetLysTyrMetGly 
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AAGGGCTCTATGACTGGGCTGGCCCTGAAACACATGTTTGAGAGA 
LysGlySerMetThrGlyLeuAlaLeuLysHisMetPheGluArg 

2296 AGTTTTACCCAAGGAGAAGGGGCC AGGCCCCTTTTCCAC AAGGGT 
SerPheThrGlnGlyGluGlyAlaArgProLeuPheHisLysGly 

2341 GCCC AGAGCAGCCATTGTGTTCACCGACGGACGGGCTCAGGATGA 
AlaGlnSerSerHi sCy sValHi sArgArgThrGlySerGly 

2386 C GTC TC CG AGTGGGC C AGT AAAGC C AAGGC C AATGGT ATC ACT AT 

2431 GT ATGC TGTTGGGGT AGGAAAAGC C ATTGAGGAGGAACT AC AAG A 

2476 GATTGCCTCTGAGCCC AC AAACAAGCATCTCTTCTATGCCGAAGA 

2521 CTTCAGCACAATGGATGAGATAAGTGAAAAACTCAAGAAAGGCAT 

2566 CTGTGAAGCTCTAGAAGACTCCGATGGAAGACAGGACTCTCCAGC 

2611 AGGGGAACTGCCAAAAACGGTCCAACAGCCAACAGAATCTGAGCC 

2656 AGTCACCATAAATATCCAAGACCTACTTTCCTGTTCTAATTTTGC 

2701 AGTGCAAC ACAGATATCTGTTTGAAGAAGAC AATCTTTTACGGTC 

2746 TACACAAAAGCTTTCCCATTCAAC AAAACCTTCAGGAAGCCCTTT 

2791 GGAAGAAAAAC ACGATCAATGCAAATGTGAAAACCTTATAATGTT 

2836 C C AG AAC C TTGC AAAC G AAG AAGT AAG AAAATTT AC AC AGC GC TT 

2881 AGAAGAAATGACACAGAGAATGGAAGCCCTGGAAAATCGCCTGAG 

2926 ATACAGATGAAGATTAGAAATCGCGACACATTTGTAGTCATTGTA 

2971 TCACGGATTACAATGAACGCAGTGCAGAGCCCCAAAGCTCAGGCT 

3016 ATTGTTAAATC 
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1 6GTAGCCGACGCGCCGGCCGGCGCGTGACCTTGCCCCTCTTGCTC 

46 GC C TTGAAAATGGAAAAGATGC TC GC AGGC TGCTTTC TGC TG ATC 

MetGluLysMetLeuAlaGlyCysPheLeuLeuIle 

9 1 CTCGGACAGATCGTCCTCCTCCCCTGCGAGGCCAGGGAGCGGTCA 
LeuGlyGlnlleValLeuLeuProCysGluAlaArgGluArgSer 

136 CGTGGGAGGTCCATCTCTAGGGGC AGAC ACGCTCGGACCC ACCCG 
ArgGlyArgSer I leSer ArgGlyArgHi sAlaArgThrHi sPro 

181 CAGACGGCCCTTCTGGAGAGTTCCTGTGAGAACAAGCGGGCAGAC 
GlnThrAlaLeuLeuGluSerSerCysGluAsnLysArgAlaAsp 

226 CTGGTTTTCATCATTGACAGCTCTCGCAGTGTCAACACCCATGAC 
LeuValPhellelleAspSerSerArgSerValAsnThrHisAsp 

271 TATGCAAAGGTCAAGGAGTTCATCGTGGACATCTTGCAATTCTTG 
Tyr AlaLysValLysGluPhel 1 eVal Aspl leLeuGlnPheLeu 

316 GACATTGGTCCTGATGTCACCCGAGTGGGCCTGCTCCAATATGGC 
AspIleGlyProAspValThrArgValGlyLeuLeuGlnTyrGly 

361 AGCACTGTCAAGAATGAGTTCTCCCTCAAGACCTTCAAGAGGAAG 
SerThrValLysAsnGluPheSerLeuLysThrPheLysArgLys 

406 TC C GAGGTGGAGC GTGC TGTC AAG AGGATGC GGC ATC TGTCC AC G 
SerGluValGluArgAlaValLysArgMetArgHisLeuSerThr 

451 GGCACCATGACTGGGCTGGCCATCCAGTATGCCCTGAACATCGCA 
GlyThrMe t Thr GlyLeuAl all eGlnTyr Al aLeuAsnl 1 eAl a 

496 TTCTCAGAAGCAGAGGGGGCCCGGCCCCTGAGGGAGAATGTGCCA 
PheSerGluAlaGluGlyAlaArgProLeuArgGluAsnValPro 

541 CGGGTCATAATGATCGTGACGGATGGGAGACCTCAGGACTCCGTG 
ArgVallleMetlleValThrAspGlyArgProGlnAspSerVal 

586 GCCGAGGTGGCTGCTAAGGCACGGGACACGGGCATCCTAATCTTT 
AlaGluValAlaAlaLysAlaArgAspThrGlylleLeuIlePhe 

631 GCCATTGGTGTGGGCCAGGTAGACTTCAACACCTTGAAGTCCATT 
Alal leGlyValGlyGlnValAspPheAsnThrLeuLy sSer I le 

676 GGGAGTGAGCCCCATGAGGACCATGTCTTCCTTGTGGCCAATTTC 
GlySerGluProHisGluAspHisValPheLeuValAlaAsnPhe 

721 AGC C AGATTGAGAC GC TGAC C TC C GTGTTC C AG AAG AAGTTGTGC 
SerGlnlleGluThrLeuThrSerValPheGlnLysLysLeuCys 
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766 ACGGCCCACATGTGCAGCACCCT6GAGCATAACTGTGCCCACTTC 
ThrAlaHisMetCysSerThrLeuGluHisAsnCysAlaHisPhe 

811 TGC ATCAACATCCCTGGCTCATACGTCTGCAGGTGC AAAC AAGGC 
CysIleAsnlleProGlySerTyrValCysArgCysLysGlnGly 

856 TACATTCTCAACTCGGATCAGACGACTTGCAGAATCCAGGATCTG 
TyrlleLeuAsnSerAspGlnThrThrCysArglleGlnAspLeu 

901 TGTGCCATGGAGGACCACAACTGTGAGCAGCTCTGTGTGAATGTG 
CysAlaMetGluAspHisAsnCysGluGlnLeuCysValAsnVal 

946 CCGGGCTCCTTCGTCTGCGAGTGCTACAGTGGCTACGCCCTGGCT 
ProGlySerPheValCysGluCysTyrSerGlyTyrAlaLeuAla 

991 GAGGATGGGAAGAGGTGTGTGGCTGTGGACTACTGTGCCTCAGAA 
GluAspGlyLysArgCysValAlaValAspTyrCysAlaSerGlu 

1036 AACCACGGATGTGAACATGAGTGTGTAAATGCTGATGGCTCCTAC 
AsnHi sGlyCy sGluHi sGluCy sVal AsnAl aAspGly S erTyr 

1081 CTTTGCC AGTGCCATGAAGGATTTGCTCTTAACCCAGATGAAAAA 
LeuCysGlnCy sHi sGluGlyPheAlaLeuAsnProAspGluLy s 

1126 ACGTGCACAAAGATAGACTACTGTGCCTCATCTAATCATGGATGT 
ThirCysThrLysIleAspTyrCysAlaSerSerAsnHisGlyCys 

1171 CAGTACGAGTGTGTTAACACAGATGATTCCTATTCCTGCCACTGC 
GlnTyrGluCysValAsnThrAspAspSerTyrSerCysHisCys 

12 16 CTGAAAGGCTTTACCCTGAATCCAGATAAGAAAACCTGCAGAAGG 
LeuLysGlyPheThrLeuAsnProAspLysLysThrCysArgArg 

1261 ATCAACTACTGTGCACTGAACAAACCGGGCTGTGAGCATGAGTGC 
1 1 eAsnTyrCy sAlaLeuAsnLysProGlyCy sGluHi sGluCys 

1306 GTCAACATGGAGGAGAGCTACTACTGCCGCTGCCACCGTGGCTAC 
ValAsnMetGluGluSerTyrTyrCysArgCysHisArgGlyTyr 

1351 ACTCTGGACCCCAATGGCAAACCCTGCAGCCGAGTGGACCACTGT 
ThrLeuAspProAsnGlyLysProCysSerArgValAspHisCys 

1396 GCACAGC AGGACC ATGGCTGTGAGC AGCTGTGTCTGAACACGGAG 
AlaGlnGlnAspHisGlyCysGluGlnLeuCysLeuAsnThrGlu 

1441 GATTCCTTCGTCTGCCAGTGCTCAGAAGGCTTCCTCATCAACGAG 
AspSerPheValCysGlnCysSerGluGlyPheLeuIleAsnGlu 

1486 GACCTCAAGACCTGCTCCCGGGTGGATTACTGCCTGCTGAGTGAC 
AspLeuLysThrCysSerArgValAspTyrCysLeuLeuSerAsp 
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1531 CATGGTTGTGAATACTCCTGTGTCAACATGGACAGATCCTTTGCC 
HisGlyCysGluTyrSerCysValAsnMetAspArgSerPheAla 

157 6 TGTCAGTGTCCTGAGGGACACGTGCTCCGCAGCGATGGGAAGACG 
CysGlnCysProGluGlyHisValLeuArgSerAspGlyLysThr 

1621 TGTGCAAAATTGGACTCTTGTGCTCTGGGGGACCACGGTTGTGAA 
CysAlaLysLeuAspSerCysAlaLeuGlyAspHisGlyCysGlu 

1666 CATTCGTGTGTAAGCAGTGAAGATTCGTTTGTGTGCCAGTGCTTT 
HisSerCysValSerSerGluAspSerPheValCysGlnCysPhe 

17 11 G AAGGTT AT AT AC TC C GTGAAGATGG AAAAAC C TGC AGAAGGAAA 
GluGlyTyrlleLeuArgGluAspGlyLysThrCysArgArgLys 

1756 GATGTCTGCCAAGCTATAGACCATGGCTGTGAACACATTTGTGTG 
AspValCysGlnAlalleAspHisGlyCysGluHisIleCysVal 

1801 AACAGTGACGACTCATACACGTGCGAGTGCTTGGAGGGATTCCGG 
AsnSerAspAspSerTyrThrCysGluCysLeuGluGlyPheArg 

1846 CTCACTGAGGATGGGAAACGCTGCCGAATTTCCTCAGGGAAGGAT 
LeuThrGluAspGlyLysArgCysArglleSerSerGlyLysAsp 

1891 GTCTGCAAATCAACCCACCATGGCTGCGAACACATTTGTGTTAAT 
ValCysLysSerThrHisHisGlyCysGluHisIleCysValAsn 

1936 AATGGGAATTC C T AC ATCTGC AAATGCTC AGAGGGATTTGTTCT A 
AsnGlyAsnSerTyrlleCysLysCysSerGluGlyPheValLeu 

1981 GCTGAGGACGGAAGACGGTGCAAGAAATGCACTGAAGGCCCAATT 
AlaGlxiAspGlyArgArgCysLysLysCysThrGluGlyProIle 

2026 GACCTGGTCTTTGTGATCGATGGATCCAAGAGTCTTGGAGAAGAG 
AspLeuVal PheVal I 1 eAspGly Ser Ly s Ser LeuGlyGluGlu 

2071 AATTTTGAGGTCGTGAAGCAGTTTGTCACTGGAATTATAGATTCC 
AsnPheGluValValLysGlnPheValThrGlyllelleAspSer 

2116 TTGACAATTTCCCCCAAAGCCGCTCGAGTGGGGCTGCTCCAGTAT 
LeuThrlleSerProLysAlaAlaArgValGlyLeuLeuGlnTyr 

2161 TCCACACAGGTCCACACAGAGTTCACTCTGAGAAACTTCAACTCA 
SerThrGlnValHisThrGluPheThrLeuArgAsnPheAsnSer 

2206 GCCAAAGACATGAAAAAAGCCGTGGCCCACATGAAATACATGGGA 
AlaLysAspMetLysLysAlaValAlaHisMetLysTyrMetGly 
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2251 AAGGGCTCTATGACTGGGCTGGCCCTGAAACACATGTTTGAGAGA 
LysGlySerMetThrGlyLeuAlaLeuLysHisMetPheGluArg 

2296 AGTTTT AC CC AAGGAG AAGGGGC C AGGCC C TTTTC C AC AAGGGTG 
SerPheThrGlnGlyGluGlyAlaArgProPheSerThrArgVal 

2341 CCCAGAGCAGCCATTGTGTTCACCGACGGACGGGCTCAGGATGAC 
ProArgAlaAlalleValPheThrAspGlyArgAlaGlnAspAsp 

2386 GTCTCCGAGTGGGCCAGTAAAGCCAAGGCCAATGGTATCACTATG 
ValSerGluTrpAlaSerLysAlaLysAlaAsnGlylleThrMet 

2431 TATGCTGTTGGGGTAGGAAAAGCCATTGAGGAGGAACTACAAGAG 
TyrAlaValGlyValGlyLysAlalleGluGluGluLeuGlnGlu 

2476 ATTGCCTCTGAGCCCACAAACAAGCATCTCTTCTATGCCGAAGAC 
IleAlaSerGluProThrAsnLysHisLeuPheTyrAlaGluAsp 

2521 TTCAGCACAATGGATGAGATAAGTGAAAAACTCAAGAAAGGCATC 
PheSerThrMetAspGluIleSerGluLysLeuLysLysGlylle 

2566 TGTGAAGCTCTAGAAGACTCCGATGGAAGACAGGACTCTCCAGCA 
CysGluAlaLeuGluAspSerAspGlyArgGlnAspSerProAla 

2611 GGGGAACTGCCAAAAACGGTCCAACAGCCAAC AGAATCTGAGCCA 
GlyGluLeuProLysThrValGlnGlnProThrGluSerGluPro 

2656 GTCACCATAAATATCCAAGACCTACTTTCCTGTTCTAATTTTGCA 
ValThrlleAsnlleGlnAspLeuLeuSerCysSerAsnPheAla 

2701 GTGCAACACAGATATCTGTTTGAAGAAGACAATCTTTTACGGTCT 
ValGlnHisArgTyrLeuPheGluGluAspAsnLeuLeuArgSer 

2746 ACACAAAAGCTTTCCCATTCAACAAAACCTTCAGGAAGCCCTTTG 
ThrGlnLysLeuSerHisSerThrLysProSerGlySerProLeu 

2791 GAAGAAAAACACGATCAATGCAAATGTGAAAACCTTATAATGTTC 
GluGluLysHisAspGlnCysLysCysGluAsnLeuIleMetPhe 

2836 CAGAACCTTGCAAACGAAGAAGTAAGAAAATTAACACAGCGCTTA 
GlnAsnLeuAlaAsnGluGluValArgLysLeuThrGlnArgLeu 

2881 GAAGAAATGACACAGAGAATGGAAGCCCTGGAAAATCGCCTGAGA 
GluGluMetThrGlnArgMetGluAlalieuGluAsnArgLeuArg 

2926 TACAGATGAAGATTAGAAATCGCGAC ACATTTGTAGTCATTGTAT 
TyrArg 
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2971 CACGGATTACAATGAACGCAGTGCAGAGCCCCAAAGCTCAGGCTA 

3016 TTGTTAAATCAATAATGTTGTGAAGTAAAACAATCAGTACTGAGA 

3061 AACCTGGTTTGCCACAGAACAAAGACAAGAAGTATACACTAACTT 

3106 GTATAAATTTATCTAGGAAAAAAATCCTTCAGAATTCTAAGATGA 

3151 ATTTACCAGGTGAGAATGAATAAGCTATGCAAGGTATTTTGTAAT 

3196 ATACTGTGGACACAACTTGCTTCTGCCTCATCCTGCCTTAGTGTG 

3241 CAATCTCATTTGACTATACGATAAAGTTTGCACAGTCTTACTTCT 

3286 GTAGAACACTGGCCATAGGAAATGCTGTTTTTTTGTACTGGACTT 

3331 TACCTTGATATATGTATATGGATGTATGC ATAAAATCATAGGACA 

3376 TATGTACTTGTGGAAC AAGTTGGATTTTTTATAC AATATTAAAAT 

3421 TCACCACTTCAGAGAAAAGTAAAAAAA 
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1 CGGCCCTTCTCACACTCCTGCCCTGCTGATGTGGAACGGGGTTTG 
46 GGGTTC TGC AGGGC T ATTGTC TGC GC TGGGG AAGGGG AC AGGC C G 
9 1 GGACCGGGACCTCCGCTCGCAGCCGGCCGCACCAGCAGGACAGCT 



136 GGCCTGAAGCTCAGAGCCGGGGCGTGCGCCATGGCCCCACACTGG 

MetAlaProHisTrp 

181 GCTGTCTGGCTGCTGGCAGCAAGGCTGTGGGGCCTGGGCATTGGG 
AlaValTrpLeuLeuAlaAlaArgLeuTrpGlyLeuGlylleGly 

226 GCTGAGGTGTGGTGGAACCTTGTGCCGCGTAAGACAGTGTCTTCT 
AlaGluValTrpTrpAsnLeuValProArgLysThrValSerSer 

27 1 GGGGAGCTGGCCACGGTAGTACGGCGGTTCTCCCAGAGCGGCATC 
GlyGluLeuAlaThrValValArgArgPheSerGlnThrGlylle 

316 CAGGACTTCCTGACACTGACGCTGACGGAGCCCACTGGGCTTCTG 
GlnAspPheLeuThrLeuThrLeuThrGluProThrGlyLeuLeu 

361 TACGTGGGCGCCCGAGAGGCCCTGTTTGCCTTCAGCATGGAGGCC 
TyrValGlyAlaArgGl\iAlaI.euPheAlaPheSerMetGluAla 

406 C TGGAGC TGC AAGGAGC GATCTC C TGGG AGGC C C C C GTGG AGAAG 
LeuGluLeuGlnGlyAlalleSerTrpGluAlaProValGluLys 

451 AAG AC TG AGTGT ATC C AGAAAGGG AAG AAC AAC C AG AC C G AGTGC 
Ly sThrGluCy s 1 1 eGlnLy sGlyLy s AsnAsnGlnThrGluCy s 

496 TTCAACTTCATCCGCTTCCTGCAGCCCTACAATGCCTCCCACCTG 
PheAsnPhe I leAr gPheLeuGlnProTyr AsnAl aSerHi sLeu 

541 TACGTCTGTGGCACCTACGCCTTCCAGCCCAAGTGCACCTACGTC 
TyrValCysGlyThrTyrAlaPheGlnProLysCysThrTyrVal 

586 AACATGCTCACCTTCACTTTGGAGCATGGAGAGTTTGAAGATGGG 
AsnMetLeuThrPheThrLeuGluHisGlyGluPheGluAspGly 

631 AAGGGCAAGTGTCCCTATGACCCAGCTAAGGGCCATGCTGGCCTT 
LysGlyLysCysProTyrAspProAlaLysGlyHisAlaGlyLeu 

676 C TTGTGG ATGGTGAGC TGT ACTC GGC C AC AC TC AAC AAC TTC C TG 
LeuValAspGlyGluLeuTyrSerAlaThrLexiAsnAsnPheLeu 

721 GGCACGGAACCCATTATCCTGCGTAAC ATGGGGCCCCACCACTCC 
GlyThrGluProI 1 ell eLenArgAsnMe t Gly ProHi sHi s Ser 
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766 ATGAAGACAGAGTACCTGGCCTTTTGGCTCAACGAACCTCACTTT 
MetLysThrGluTyrLeuAlaPheTrpLeuAsnGluProHisPhe 

811 GTAGGCTCTGCCTATGTACCTGAGAGGGTGGGCCTGCTGTGGACA 
ValGlySerAlaTyrValProGluArgValGlyLeuLeuTrpThr 

856 ATGGCATACTCTCTTCCAGCCCTAGGAGGAGGGCTCCTAACAGTG 
MetAlaTyrSerLeuProAlaLeuGlyGlyGlyLeuLeuThrVal 

901 TAACTTATTGTGTCCCCGCGTATTTATTTGTTGTAAATATTTGAG 
946 TATTTTTATATTGACAAATAAA 
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GGCACCAGGCCTTCCGGAGAGACGCAGTCGGCTGCCACCCCGGGA 

M 

TGGGTCGCTGGTGCCAGACCGTCGCGCGCGGGCAGCGCCCCCGGA 
etGlyArgTrpCysGlnThrValAlaArgGlyGlnArgProArgT 

CGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTGT 
hrSerAlaProSerArgAlaGlyAlaLeuLeuLeuLeuLeuLeuL 

TGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGG 
euLeuArgSerAlaGlyCysTrpGlyAlaGlyGluAlaProGlyA 

CGCTGTCCACTGCTGATCCCGCCGACCAGAGCGTCCAGTGTGTCC 
laLeuSerThrAlaAspProAlaAspGlnSerValGlnCysValP 

CCAAGGCCACCTGTCCTTCCAGCCGGCCTCGCCTTCTCTGGCAGA 
roLysAlaThrCysProSerSerArgProArgLeuLeuTrpGlnT 

CCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAAT 
hrProThrThrGlnThrLeuProSerThrThrMetGluThrGlnP 

TCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCT 
heProValSerGluGlyLysValAspProTyrArgSerCysGlyP 

TTTCCTACGAGCAGGACCCCACCCTCAGGGACCCAGAAGCCGTGG 
heSerTyrGluGlnAspProThrLeuArgAspProGluAlaValA 

CTCGGCGGTGGCCCTGGATGGTCAGCGTGCGGGCCAATGGCACAC 
1 aArgArgTrpProTrpMe tVal SerVal ArgAl aAsnGlyThrH 

ACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACTG 
i si leCysAlaGlyThrll el 1 eAlaSerGlnTrpValLeuThrV 

TGGCCCACTGCCTGATCTGGCGTGATGTTATCTACTCAGTGAGGG 
alAlaHisCysLeuIleTrpArgAspVallleTyrSerValArgV 

TGGGGAGTCCGTGGATTGACCAGATGACGCAGACCGCCTCCGATG 
alGlySerProTrpI 1 eAspGlnMetThrGlnThrAlaSerAspV 

TCCCGGTGCTCCAGGTCATCATGCATAGCAGGTACCGGGCCCAGC 
alProValLeuGlnVallleMetHisSerArgTyrArgAlaGlnA 

GGTTCTGGTCCTGGGTGGGCCAGGCCAACGACATCGGCCTCCTCA 
rgPheTrpSerTrpValGlyGlnAlaAsnAspIleGlyLeuLeuL 

AGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCT 
ysLeuLysGlnGluLeuLysTyrSerAsnTyrValArgProIleC 
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721 GCCTGCCTGGCACGGACTATGTGTTGAAGGACCATTCCCGCTGCA 
ysLeuProGlyThrAspTyrValLeuLysAspHisSerArgCysT 

766 C TGTG ACGGGC TGGGGAC TTTCC AAGGC TGAC GGC ATGTGGC CTC 
hrValThrGlyTrpGlyLeuSerLysAlaAspGlyMetTrpProG 

811 AGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAACA 
InPheArgThrlleGlnGluLysGluValllelleLeuAsnAsnL 

856 AAGAGTGTGACAATTTCTACC ACAACTTCACCAAAATCCCCACTC 
y sGluCys AspAsnPheTyrHi sAsnPheThrLy s I leProThrL 

901 TGGTTCAGATC ATC AAGTCCC AGATGATGTGTGCGGAGGAC ACCC 
euValGlnllelleLysSerGlnMetMetCysAlaGluAspThrH 

946 ACAGGGAGAAGTTCTGCTATGAGCTAACTGGAGAGCCCTTGGTCT 
isArgGluLysPheCysTyrGluLeuThrGlyGluProLeuValC 

991 GC TC C ATGGAGGGC ACGTGGT AC C TGGTGGG ATTGGTG AGC TGGG 
ysSerMetGluGlyThrTrpTyrLeuValGlyLeuValSerTrpG 

1036 GTGC AGGC TGC C AG AAGAGC G AGGC CCCACC CATC T AC C TAC AGG 
lyAlaGlyCysGlnLysSerGluAlaProProIleTyrLeuGlnV 

1081 TCTCCTCCTACCAACACTGGATCTGGGACTGCCTCAACGGGCAGG 
alSerSerTyrGlnHisTrpIleTrpAspCysLe\iAsnGlyGlnA 

112 6 CCCTGGCCCTGCCAGCCCCATCCAGGACCCTGCTCCTGGCACTCC 
laLeuAlaLeuProAlaProSerArgThrLeuLeuLeuAlaLeuP 

117 1 CACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCCT 
roLeuProLeuSerLeuLeuAlaAlaLeu 

1216 CCCTCACTTGTGGGCCCCCCTTGCCTCCGTGCCCAGGTTGCTGTG 

1261 GGTGC AGC TGTCAC AGCC CTGAGAGTCAGGGTGGAGATGAGGTGC 

1306 TCAATTAAACATTACTGTTTTCCATGTAAAAAAAAAAAAAAAAAA 

1351 AAAAAAAAA 



Fig. 7 (continued) 
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.CCCCTCTGCCTGCCCCAGCCC«:CCATCGCTTCCCCTTT6(»GCCTCCTGC 
81 

AGTGATGGTMGTGCTGGCCCAC^CTGMGCTCGGAGAGGCAC^ 
161 

TCCTGGCTTCCAGTCTCCCTTGCTTCCCAGATCCCAGACTC 
241 

TACGCAATCTGCGCCTGCGTCTCATCAGTCGCCCCA^ 
321 

AACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCC 
401 

GMGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGAC 
481 

GTGTCTGCCATAGCTGGGCTCAGGCATCTGTCCTTGGCTTTGTTGCCTGGC 
561 

TGCCTCGAGCCTGACGGACACTGGGTTCAGGCTGGCAT^^ 
641 

GCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 
721 

AGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGGACAGCAGGTCCCC^ 
MetSerAspGluAspSerCysVama(VsGlySerI*nArgThrAlaGlyPro6lnAla61yAlaPro 
801 

TCCCCATGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGG^ 
SerProTrpProTrpGluAlaArgUuMetHisGlnGlyGlnLeuMaCysGlyGlyAlaLeu 
881 

GCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAG 
lLeuThrAlaMaHisCysPhelleGlyArgGlnAlaP^ 
961 

GGGGCCTGMGCAGCT(^TCCTGCATGGAGCCTA(^rcCACCCTGAGGGGGGCTACGACA 
rpGlyLeuLysGlnteuIleteiiHisGlyAlaTyr^ 
1041 

CAGCCTGTGACACTGGGAGCCAGCCTGCGGC(XCTC 
GlnProValThrLeuGlyAlaSerteuArgProLe^^ 
1121 

CTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAG^ 
yTrpValteuGlyArgAlaArgProGlyMaGlylleSerSerl^^^ 
1201 

CCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTC 
latysSerArgteuHisAlaMaProGlyGlyAspGlySetf^^ 
1281 

GAGCTGCCCAGCTGTGAGGTGAGCCCCAGGC(X!CCACACCTTACCTMCAGGCCCCTGG 
GluLeuProSerCysGluValSerProArgProProHisLeuThr 
1361 

AAGAACGGACCTTCCAGGCTTGGCCTCTGGACCCACCTCCCACCTGAAW 
1441 

GCCAG 

Fig. 8 
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1 CTTAACAGCCACTTGTTTCATCCCACCTGGGCATTAGGTTGACTT 

4 6 C AAAGATGC C TC AGTT ACTGC AAAAC ATT AATGGG ATC ATC GAGG 
Me t Pr oGl nL euL euG 1 nAsnl 1 e AsnGly 1 1 e 1 1 eGluA 

91 CCTTCAGGCGCTATGCAAGGACGGAGGGCAACTGCACAGCGCTCA 
laPheArgArgTyrAlaArgThrGluGlyAsnCysThrAlaLeuT 

136 CCCGAGGGGAGCTGAAAAGACTCTTGGAGCAAGAGTTTGCCGATG 
hrArgGlyGluLeuLysArgLeuLeuGluGlnGluPheAlaAspV 

181 TGATTGTGAAACCCCACGATCCAGCAACTGTGGATGAGGTCCTGC 
allleValLysProHisAspProAlaThrValAspGluValLeuA 

226 GTCTGCTGGATGAAGACCACACAGGGACTGTGGAATTCAAGGAAT 
rgLeuLeuAspGluAspHisThrGlyThrValGluPheLysGluP 

271 TCCTGGTCTTAGTGTTTAAAGTTGCCCAGGCCTGTTTCAAGACAC 
heLeuValLeuValPheLysValAlaGlnAlaCysPheLysThrL 

316 TG AGC GAGAGTGC TGAGGG AGC C TGC GGCTCTCAAGAGTC TGGAA 
euSerGluSerAlaGluGlyAlaCysGlySerGlnGluSerGlyS 

361 GCCTCCACTCTGGGGCCTCGCAGGAGCTGGGCGAAGGACAGAGAA 
erLeuHisSerGlyAlaSerGlnGluLeuGlyGluGlyGlnArgS 

406 GTGGCACTGAAGTGGGAAGGGCGGGGAAAGGGCAGCATTATGAGG 
e rGlyThrGluVa lGlyAr gAl aGlyLy sGlyGlnHi sTyrGluG 

451 GGAGCAGCCACAGACAGAGCCAGCAGGGTTCCAGAGGGCAGAACA 
lySerSerHlsArgGlnSexrGlnGlnGlySerArgGlyGlnAsnA 

496 GGCCTGGGGTTCAGACCCAGGGTCAGGCCACTGGCTCTGCGTGGG 
rgProGlyValGlnThrGlnGlyGlnAlaThrGlySerAlaTrpV 

541 TCAGCAGCTATGACAGGCAAGCTGAGTCCCAGAGCCAGGAAAGAA 
alSerSerTyrAspArgGlnAlaGluSerGlnSerGlnGluArgl 

586 T AAGC C C GC AGAT ACAACTC TC TGGGC AG ACAGAGC AG AC C CAGA 
leSerProGlnlleGlnLeuSerGlyGlnThrGluGlnThrGlnL 

631 AAGCTGGAGAAGGCAAGAGGAATCAGACAACAGAGATGAGGCCAG 
ysAlaGlyGluGlyLysArgAsnGlnThrThrGluMetArgProG 

676 AGAGACAGCCACAGACCAGGGAACAGGACAGAGCCCACCAGACAG 
luArgGlnProGlnThrArgGluGlnAspArgAlaHisGlnThrG 



Fig. 9 
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GTGAGACTGTGACTGGATCTGGAACTCAGACCCAGGCAGGTGCCA 
lyGluThrValThrGlySerGlyThrGlnThrGlnAlaGlyAlaT 

CCCAGACTGTGGAGCAGGACAGCAGCCACCAGACAGGAAGCACCA 
hrGlnThrValGluGlnAspSerSerHisGlnThrGlySerThrS 

GCACCCAGACACAGGAGTCCACCAATGGCCAGAACAGAGGGACTG 
erThrGlnThrGlnGluSerThrAsnGlyGlnAsnArgGlyThrG 

AGATCCACGGTCAAGGCAGGAGCCAGACCAGCCAGGCTGTGACAG 
lnlleHisGlyGlnGlyArgSerGlnThrSerGlnAlaValThrG 

GAGGACACACTCAGATACAGGCAGGGTCACACACCGAGACTGTGG 
lyGlyHi sThrGlnl 1 eGlnAl aGlySerHi sThrGluThrValG 

AGC AGGAC AGAAGCC AAAC TGT AAGCC ACGGAGGGGCT AGAGAAC 
luGlnAspArgSerGlnThrValSerHisGlyGlyAlaArgGluG 

AGGGACAGACCCAGACGCAGCCAGGCAGTGGTCAAAGATGGATGC 
InGlyGlnThrGlnThrGlnProGlySerGlyGlnArgTrpMetG 

AAGTGAGCAACCCTGAGGCAGGAGAGACAGTACCGGGAGGACAGG 
InValSerAsnProGluAlaGlyGluThrValProGlyGlyGlnA 

C C C AGACTGGGGC AAGC AC TG AGTC AGG AAGGC AGGAGTGGAGC A 
laGlnThrGlyAlaSerThrGluSerGlyArgGlnGluTrpSerS 

GCACTCACCCAAGGCGCTGTGTGACAGAAGGGCAGGGAGACAGAC 
erThrHisProArgArgCysValThrGluGlyGlnGlyAspArgG 

AGC C C AC AGTGGTTGGTGAGG AATGGGTTGAT GAC C AC TC AAGGG 
InProThrValValGlyGluGluTrpValAspAspHisSerArgG 

AGACAGTGATCCTCAGGCTGGACCAGGGCAACTTGCATACCAGTG 
luThrVal IleLeuArgLeuAspGlnGlyAsiLLeuHi sThrSerV 

TTTC C TC AGC AC AGGGC CAGGATGC AGC CC AGTC AGAAGAGAAGC 
alSerSerAlaGlnGlyGlnAspAlaAlaGlnSerGluGluLysA 

GAGGCATCACAGCTAGAGAGCTGTATTCCTACTTGAGAAGCACCA 
rgGlylleThrAlaArgGluLeuTyrSerTyrLeuArgSerThrL 

AGCCATGACTTCCCCGACTCCAATGTCCAGTACTGGAAGAAGACA 
ysPro 

GCTGGAGAGAGTTTGGCTTGTCCTGCATGGCCAATCCAGTGGGTG 
CATCCCTGGACATCAGCTCTTCATTATGCAGCTTCCCTTTTAGGT 
C TTTC TCAATGAGATAATTTCTGCAAGG AGC TTTCT ATC C TGAAC 
TCTTCTTTCTTACCTGCTTTGCGGTGCAGACCCTCTCAGGAGCAG 
GAAGACTCAGAACAAGTCACCCCTT 



Fig. 9 (continued) 
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Normal & Tumor Tissues 

Mammary gland 30.4 

Breast ca.* {pi. effusion) MCF-7 4.8 

Breast ca.* (pl.ef) MDA-MB-231 2.2 

Breast ca.* (pi. effusion) T47D 9.8 

Breast ca. BT-549 9.2 

Breast ca. MDA-N 1.3 

Ovary 6.0 

Ovarian ca. OVCAR-3 1.6 

Ovarian ca. OVCAR-4 1.9 

Ovarian ca. OVCAR-5 7.1 

Ovarian ca. OVCAR-8 1.3 

Ovarian ca. IGROV-1 0.7 

Ovarian ca* (ascites) SK-OV-3 2.5 

Myometrium 2.3 

Uterus 6.3 

Placenta 100.0 

Prostate 13.3 

Prostate ca* (bone met) PC-3 7.9 

Testis 14.3 

Melanoma Hs688(A).T 1.4 

Melanoma* (met) Hs688(B).T 5.3 

Melanoma UACC-62 0.6 

Melanoma M14 0.9 

Melanoma LOX IMVI 1.0 

Melanoma* (met) SK-MEL-5 0.0 

Melanoma SK-MEL-28 100.0 
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Figure 15. Nucleotide Sequence for CGI 0631 8-OL 



>CG106318-01 4810 nt 

GTCCATGGGGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACGAGGGCTG 

TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGCAGGCCGAGA 

GACCCMTAACC^GCAGAATTGTTTCAAAGTTTGCGATTGGCACAAAGAGTTGTACGACT 

GGAGACTGGGACCTTGGAATCAGTGTCAGCCCGTGATTTCAAAAAGCCTAGAGAAACCTC 

TTGAGTGCATTAAGGGGGAAGAAGGTATTCAGGTGAGGGAGATAGCGTGCATCCAGAAAG 

ACAAAGACATTCCTGCGGAGGATATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGG 

AGCAGGCTTGCCTCATTCCTTGCCAGCAAGATTGCATCGTGTCTGAATTTTCTGCCTGGT 

CCGAATGCTCCAAGACCTGCGGCAGCGGGCTCCAGCACCGGACGCGTCATGTGGTGGCGC 

CCCCGCAGTTCGGAGGCTCTGGCTGTCCAAACCTGACGGAGTTCCAGGTGTGCCAATCCA 

GTCCATGCGAGGCCGAGGAGCTCAGGTACAGCCTGCATGTGGGGCCCTGGAGCACCTGCT 

CAATGCCCCACTCCCGACAAGTAAGACAAGCAAGGAGACGCGGGAAGAATAAAGAACGGG 

AAAAGGACCGCAGCAAAGGAGTAAAGGATCCAGAAGCCCGCGAGCTTATTAAGAAAAAGA 

GAAACAGAAACAGGCAGAACAGACAAGAGAACAAATATTGGGACATCCAGATT GGAT ATC 

AGACCAGAGAGGTTATGTGCATTAACAAGACGGGGAAAGCTGCTGATTTAAGCTTTTGCC 

AGCAAGAGAAGCTTCCAATGACCTTCCAGTCCTGTGTGATCACCAAAGAGTGCX^AGGTTT 

CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 

GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTC 

CAGAATTTGAAGAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCA 

CGTATGGCTGGAGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGC 

AGGACAAGAGGCGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGG 

TGTACTGCGTGCAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAG 

AAGCCTCAAAGCCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGC 

TGTGCCACATTCCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTT 

GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGC 

GCATTACCAATGAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGG 

AAGCCATTCCCTGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACT 

GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTCAAGAGGTTGTGTGCA 

TCAACAGTGATGGAGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTCCCCATCC 

CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGT 

CCTCCTGCTCACACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGAT 

CCATTCTGGCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGC 

AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCT 

GGGGCCAGTGCATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATG 

GGGAGGCCTCCTGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATG 

TGGGCCAAGTGGGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTT 

GTCTGCTTCCTTGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCC 

CCTCTTCGTGTAAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCA 

TTCAGCTGCCAGCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCT 

GTGAGGCACCTCAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCC 

AATTAGTCCCTTGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTG 

ggcgacaggcaagagccattacttgtcgcaagcaagatggaggacaggctggaatccatg 

agtgcctacagtatgcaggccctgtgccagc cctta cccaggcctgccagatcccctgcc 

aggatgactgtcaattgaccagctggtccaagttttcttcatgcaatggagactgtggtg 

cagttaggaccagaaagcgcactcttgttggaaaaagtaaaaagaaggaaaaatgtaaaa 

attccc^tttgtatcccctgattgagactcagtattgtccttgtgacaaatataatgcac 

aacctgtggggaactggtc^gactgtattttaccagagggaaaagtggaagtgttgctgg 

gaatgaaagtacaaggagacatcaaggaatgcggacaaggatatcgttaccaagcaatgg 

catgctacgatcaaaatggcaggcttgtggaaacatctagatgtaacagccatggttaca 

ttgaggaggcx:tgcatcatcccctgcccctcagactgcaagctcagtgagtggtccaact 

ggtcgcgctgcagcaagtcctgtgggagtggtgtgaaggttcgttctaaatggctgcgtg 

aaaaacxatataatggaggaaggccttgccccaaactggaccatgtcaaccaggcacagg 

tgtatgaggttgtcccatgccacagtgactgcaaccagtacctatgggtcacagagccct 

ggagcatctgcaaggtgacctttgtgaatatgcgggagaactgtggagagggcgtgcaaa 

cccgaaaagtgagatgcatgcagaatacagcagatggcccttctgaacatgtagaggatt 

acctctgtgacccagaagagatgcccctgggctctagagtgtgcaaattaccatgccctg 

aggactgtgtgatatctgaatggggtccatggacccaatgtgttttgccttgcaatcaaa 

gcagtttccggcaaaggtcagctgatcccatcagacaaccagctgatgaaggaagatctt 

gccctaatgctgttgagaaagaaccctgtaacctgaacaaaaactgctaccactatgatt 

ataatgtaacagactggagtacatgtcagctgagtgagaaggcagtttgtggaaatggaa 

taaaaacaaggatgttggattgtgttcgaagtgatggcaagtcagttgacctgaaatatt 



GTGAAGCGCTTGGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCC 

CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCC 

TCACAGGAAAAATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGAC 

CATGCCCTTCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGC 

AATATGGCCAGTGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAA 

CAAGGAAC^TTTCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGG 

ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGG 

AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 

GCCTGTGTCAGCTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCA 

GATCCAGACCGGTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCXJAGAGCAGATGT 

TAGAAACAAAATCATGTTATGATGGACAGTGCTATGAATATAAATGGATGGCCAGTGCTT 

GGAAGGGCTCTTCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGG 

GCTGCTTGGTGATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAAC 

CCCACTCGTACTGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGTACACTGAAGTCA 

TGTCTTCTAACAGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCA 

TGGAGGACAAAAGAGGAGATGTGAAAACCAGTCGGGCTGTACATCCAACCCAACCCTCCA 

GTAACCCAGCAGGACGGGGAAGGACCTGGTTTCTAC^GCCATTTGGGCCAGATGGGAGAC 

TAAAGACCTGGGTTTACGGTGTAGCAGCTGGGGC^TTTGTGTTACTCATCTTTATTGTCT 

CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 

TGAAACCTTTAACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTG 

GCAACAACCA (SEQ ID NO: 40) 



Protein Sequence for CG106318-01 ORF Start: 18 ORF Stop: 4782 Frame: 3 



Protein Sequence: 
>CG106318-01-prot 1588 aa 

MGDECGPGGIQTRAVWCAHVEGVnTLHTNCKQAERPN^ 

NQCQPVISKSLEKPLECIKGEEGIQVREIACIQKDKDIPAEDIICEYFEPKPLLEQACLI 

PC(X)DCIVSEFSAWSECSKTCGSGLQHRTRHWAPPQFGGSGCPNLTEFQVCQSSPCEAE 

ELRYSLHVGPWSTCSMPHSRQVRQARRRGKNKEREKDRSKGVKDPEARELIKKKRNRNRQ 

NRQENKYWDIQIGYQTREVMCINKTGKAADLSFCCK3EKLPMTFQSCVITKECQVSEWSEW 

SPCSKTCHDMVSPAGTRVRTRTIRQFPIGSEKECPEFEEKErcLSQGTCWF^^TYGWRT 

TEVVTECRVDPLLSQQDKRRGNQTALCGGGIQTREWCV(W4ENLLSQLSTHKNKEASKPM 

DLKLCTGPIPhTTTQLCHIPCPTECEVSFWSAWGPCTYENCND 

TGGSGVTGNCPHLLEAIPCEEPACYDWKAVRLGDCEPDNGKECGPGTQVQEWCINSDGE 

EVDRQLCRDAI FPii^ACDAPCPKDCVLSTWSTWSSCSHTCSGKTTEGK Qi RAR SILAYA 

GEEGGIRCPNSSALQEVRSCNEHPCTVYHWQTGPWGtXIEDTSVSSFNTmW 

VGMQTRKVICVRVNVGQVGPKKC^ESLRPETVRPC^^ 

GDSSIRKQSRHRVIIGtPANGGRDCTDPLYEEKACEAPQACQSYRVN^^ 

V(X)DSPGAQEGCGPGRCW^TCRKCM>GGQAGIHECLQYAGPVPALTQACQIPCQDDCQL 

TSWSKFSSCNGDCGAVRTRKRTLVGKSKKKEKCKNSHLYPLiETQYO>CDKYN^ 

SCXILPEGKVEVLLGMKVTODIKECGQGYRYQAMACYDQNGRLVETSRCNSHGYIEEACI 

IPCPSDCKLSEWSNWSRC^KSCGSGVKVRSKmi^ 

CHSDCNQYLWVTEPWSICKVTFVNMRENCGEGVQTTC^^ 

EMPLGSRVC^LPCPEDCNrtSEWGFWTCH^Vl^ 

KEPCNL^NCYHYDYN\m>WST(XH-SEK^^ 

EKfWQMNTSCMVECPVNCQLSDWSFWSECSQTCGLTGK^ 

DQSKPCPVKPCYRWQYGQWSPCQVQEACX^GEGTRTRNISCWSDGSADOFSKW^ 

DIELIIDGNKNMVLEESCSQPCPGDCYLKDWSSWSLCQLTC^GEDLGFGGIQVRSRPVl 

IQELENQHLCPEQMLETKSCYDGQCYEYKWMASAWKGSSRTSW 

QPDADRSCNPPCSQPHSYCSETKTCHCEEGYTEVMSSNSTl£QCTU^ 

DVKTSRAVHPTQPSSNPAGRGRTWFLQPFGPDGRLKTWW^ 

CKKPKKPQRRQNNRLKPLTLAYDGDADM (SEQ ID NO: 41) 



Figure 16. Nucleotide and Protein Sequences for CG50817-04. 



>CG50817-04 1447 nt 

6CG6ACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGC 

CCCACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGG 

CCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAG 

GGAGAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAG 

GACTGGGGGAAAGAGCTGCAATCAGAGGGTGTCTGCCATAGCTGGGCTCAGGCATCTGTC 

CTTGGCTTTGTTGCCTGGCTCCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCT 

GACGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGAC 

GCTCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAG 

GGGGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGT 

GTAGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCC 

TGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAG 

GAGGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGC 

GTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCC 

TACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACA 

CTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGG 

GAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACA 

GTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGT 

GATGGCAGCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGC 

TGTGAGGCCAACCAACCAGCTGCTGACAGGGGACCTGGCCATTCTCAGGAACAAGAGAAT 

GCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCCCCACCCTGTCATGTGTGATTCCAG 

GCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGGAAGGAACCTGCCTGGGGCCACAGG 

TGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGTGGACACTCCCACACCCAACTCTGC 

TACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTACCCTTTCAGATACAATCACGCCAGC 

CACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCT^ 

ATAAATT (SEQ ID NO:42) 

Protein Sequence for CG5081 7-04 ORF Start: 520 ORF Stop: 1192 Frame: 1 

Protein Sequence: 
>CG50817-04-prot 224 aa 

MSDEDSCVACGSLRTAGPQAGAPSmfWEARLMHQGCMJKCGGALVSEEAVLTAAHCFIGR 
QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMAU 

DHHl^DGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPH-PGMVCTS 
AVGELPSCEANQPAADRGPGHSQEQENAGRQMALLPLSSPPCHV (SEQ ID NO:43) 



Figure 17. Nucleotide and Protein Sequences for CG50817-05. 



. Nucle tide sequence encoding the Peptidase-like protein of the invention. 

>CG50817-05 

CGCTGGGCCTCTGTCCTGA TGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 
CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 
GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 
CCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAGGCCAGTGTGAGG 240 
AGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGGGTCCTCACTGCT 300 
GCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTGCGTGAGGGACTCAGCC 360 
CCTGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTGCCCAGGGCCTATAACCACTAC 420 
AGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCACCCCACGACCCACACACCCCTC 480 
TGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCCTCCTGCTGGGCCACTGGCTGG 540 
GATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGT 600 
CGCCCCACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCC 660 
CGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGGAGAT 720 
TCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCTGGCATCATC 780 
AGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAACACAGCTGCT 840 
CACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAGAGCCCAGAG 900 
ACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGGACAGCAGGT 960 
CCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAG 1 020 
CTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC 1 080 
ATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGG 1 140 
GGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATG 1 200 
GCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTG 1260 
CCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGC 1320 
CCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCC 1380 
TGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCGGGGATGGTG 1440 
TGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCCAACCAACCAGCTGCTGACAGG 1 500 
GGACCTGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTG 1560 
TCCTCCCCACCCTGTCATGTGTG ATTCCAGGC 1592 
(SEQ ID NO: 44) 



Protein sequence encoded by the coding sequence shown above. 

>CG50817-05 

MLLSSLVSLAGSWLAWILFFVLYDFCIVCITTYAINVSLMWLSFRKVQEPQGQPKPQEG 60 
NTVPGEWPWC^SVRRQGAHICSGSLVADTWVLTAAHCFEKAAATCLNSCVRDSAPGAEEV 1 20 
GVAALQLPRAYNHYSQGSDLALLQbWPTTHTPLCLPQPAHRFPFGAS(^ATGWDQDTSD 180 
APGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVL 240 
CLEPDGHVVVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQSPETPEh4SD 300 
EDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAP 360 
EEWSVGLGTRPEEWGLKQLILHGAYTWPEGGYDMALLLLAQFVTLGASLRPLCLPYADHH 420 
LPDGERGVVVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTSAVG 480 
ELPSCEANQPAADRGPGHSQEQENAGRQMALLPLSSPPCHV 521 

(SEQIDNO:45) 



Figure 18. Nucle tide and Protein Sequences for CG5081 7-06. 



Nucleotide sequence encoding the Peptidase-like pr tein of the invention. 

>CG50817-06 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGA TGCTATGTGGGGGCCCCCAGCCTGGGG 60 
TGCAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 120 
ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 180 
TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 240 
CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCT 300 
GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 360 
CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 420 
TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 480 
TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 540 
ACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAG 600 
CCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTG 660 
GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 720 
TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 780 
GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 840 
CCAACCAACCAGCTGCTGACAGGGGACCTGGCCATTCTCAGGAACAAGAGAATGCAGGCA 900 
GGCAAATGGCATTACTGCCCCTGTCCTCCCCACCCTGTCATGTGTG ATTCCAGGCACCAG 960 
GGCAGGCCCAGAAGCCCAGCAGCTGTGGGAAGGAAGCTGCCTGGGGCCACAGGTGCCCAC 1020 
TCCCCACCCTGCAGGACAGGGGTGTCTGTGGACACTCCCACACCCAACTCTGCTACCAAG 1080 
CAGGCGTCTCAGCTTTCCTCCTCCTTTACCCTTTCAGATACAATCACGCCAGCCACGTTG 1140 
TTTTGAAAATTTC I Mill I GGGGGGCAGCAGTTTTCC I I I I I I i AAACTTAAATAAATT 1200 

(SEQ ID NO:46) 



Protein sequence encoded by the coding sequence shown above. 

>CG50817-06 

MLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSS 60 
WLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLAC 1 20 
GGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALL 1 80 
LLAQPVTLGASLRPLCLPYADHHLPDGERGVVVLGRARPGAGISSLQTVPVTLLGPRACSR 240 
LHAAPGGDGSPILPGMVCTSAVGELPSCEANQPAADRGPGHSQEQENAGRQMALLPLSSP 300 
PCHV 304 

(SEQ ID NO: 47) 



Figure 19. Nucleotide and Protein Sequences For CG51 099-03. 



Nucleotide sequence encoding the Serine Protease-like protein of the invention. 



>CG51099-03 

CGGAGAGACGCAGTCGGCTGCCACCCCGGGA TGGGTCGCTGGTGCCAGACCGTCGCGCGC 6 0 

GGGCAGCGCCCCCGGACGTCTGGCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCT^ 120 

TTGCTGAGGTCTG(^GGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 180 

GATCCCX3CCGACCAGAGCGTCCAGTGTGTCCCC7VAGGCC^CCTGTCCTTCCAGCCGGCCT 240 

CGCCTTCTCTGGCAGACCCCGACCACCCAGAC^CTGCCCTCGACCACC^TGGAGACCCAA 300 

TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCT 360 

GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 420 

CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCC 4 80 

GTGGCCCACTGCCTOATCTGGCGTGATGTTAT 540 

ATTGACCAGATGACGCAGACCGCCTCCGATGTCCCGGTGCTCCAGGTCATCATGCATAGC 600 

AGGTACCGGGCCCAGCGGTTCTGGTCCTGGGTGGGCCAGGCCAACGAGATCGGCCTCCTC 660 

AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 720 

GACTATGTGTTGAAGGACC^TTCCCGCTGCACTGTGACGGGCrGGGGACTTTCCAAGGC^ 780 

GACGGCATGTGGCCTCAGTTCCGGACC^TTCAGGAGAAGGAAGTCATCATCCTGAACAAC 840 

AAAGAGTGTGACAATTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 900 

AAGTCCCAGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 960 

GGAGAGCCCTTGGTCTGCTCCATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1020 

GGTGCAGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1080 

CACTGGATCTGGG ACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1140 

CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTG ACTCTGTGTGCCC 1200 

TCCCTCACTTGTGA 1214 
(SEQ ID NO:48) 

Protein sequence encoded by the nucleotide sequence shown above. 



>CG51099-03 

MGRWCQTVARGQRPRTSAPSRAGALLLLLLLLRSAGCWGAGEAPGALSTADPADQSVQCV 60 
PKATCPSSRPRLLWQTPTTQTLPSTTMETQFPVSEGKVDPYRSCGFSYEQDPTLRDPEAV 1 20 
ARRWFWMVSVF^NGTHICAGTMASQVmWAHCLIWRDVIYSVRVGSPWIDQMTQTAS^ 1 80 
VPN^QVIMHSRYRAQRFWSWVGQANDIGLLXLKQELXYSNWRPICLPGTDWLKDHSRC 240 
T\n"GWGLSKADGMWTOFRTIQEKEVIILNNKECDNFYHNFTKIPTLVQIIKSQMMCAEDT 300 
HREKFCYELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 360 
ALALPAPSRTLLLALPLPLSLLAAL 385 (SEQ ID NO:49) 



AUG 2 2 2002 



Figure 20. Nude tide and Protein Sequences For CG57051-04. 



Nucleotide sequence encoding the Angiopoietin-like protein, CG57051-04. 



>CG5705t-O4 

TGCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGT 60 
CTTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAG 120 
TCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGA TGAGCGGTGCTCCGACGGCCGGGGC 180 
AGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTAGATCTGGACCCGTGCA 240 
GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 
GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 
GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 
CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 
CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 540 
CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 600 
CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 
CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGAGGCTGGTGGTTTGGCAC 720 
CTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAA 780 
GCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCCAC 840 
CACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG CGTCCTGGCTGGGCCTG 900 
GTCCCAGGCCCACGAAAGACGGTGACTCTTGGCTCTG 937 (SEQ ID NO:50) 



Protein sequence encoded by the nucleotide sequence shown above. 



>CG57051-04 

MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 
RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 1 20 
HKVACXJQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 1 80 
LHRGVVWFGTCSHSNLNGQYFRSIPCMRQKLKKGIFWKTWRGRYYPLQATTM 240 
AS 242 (SEQ ID NO:51) 




i&SS&^igiire 21. Nucleotide and Protein Sequences For CG57051-05. 



Nucleotide sequence encoding the Angiopoietin-like protein, CG57051-05. 



TCCCAGGCTACCTAAGAGGATGAGCGGCGCTCCGAC 120 

CGCCGCCACOSCCGTGCTACTGAGCGCT^ 180 

CTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACT 240 

GCTGCXXXSAACACGCGGAGCGCACCCGCAGTCAGCT^ 300 



GAGCCGGGTGGACCCTGAGGTCCTTCACAGCCT^ 420 

CAGGATCCTVGCAACTCOTCCACAAGGTGGCCCAGCAG^ 480 

CCTGCGAATTCAGCATCTGCAAAGCC^GTTTGGCCT 540 

TGAGGGTGGCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATG^ 600 

GGCTCACAATGTCAGCCGCCTGCACCATGGAGGCT 660 

TGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGC^ 720 

CGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCA 780 

CCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTG^ 84 0 

CCTGGGTGGCGAGGACACGGCCnTlTAGCCTGCAGCT 900 

GGGCGCCACCACCGTCCCACCCAGCGGCCTCT^ 960 

TCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCT 1020 

C^CCTG^GCCATTCCAACCTCAACGGCCA^ 1080 

GAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCX3GGGCCX3CTA 1140 

CACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCT 1200 
CTGGTCCCAGGCCCACGAAAGAGGTGACTCTTGGCTCTG 1239 (SEQ ID NO:52) 

Protein sequence for Angiopoietin-like protein, CG57051-05. 



>CG57051-05 

MSGAPTAGAALMLCAATAVLLS AQGGPVQS KS PRFASWDEMNVLAHGLIiQIiGQGLREHAE 6 0 

RTRSQLSALERRLSACGSACQGTEGSTDLPIJVPESRVDPEVLJISI^ 120 

HKVAQQQRHLEKQHLR I QHLQSQFGLLDHKHLDHEGGKPARRKRLPEMAQPVDPAHNVSR 180 

LHHGGWTVI QRRHIXSSVDFNRPWEAYKAGFGDPHGEFWLGIjEKVHS I MGDRNSRLAVQLR 240 

DWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQIXSAT^ 300 

KNCAKSLSGGWWFGTCSHSNLNGQYFRSI PQQRQKLKKGI FWKTWRGRYYPLQATTMLIQ 360 
PMAAEAAS 368 (SEQ ID N0:53) 



>CG57051-05 

(nTCGTCTCCACTCCTCGCACCTGGAA 



60 




360 



igure 22. Nucleotide and Protein Sequences For CG57051-02. 



Nucleotide sequence enc ding the Angiop ietin-like protein of the invention. 



>CG57051_02 

TGCGGATCCTCACACGACTGTGATCC^ 60 

CTTACCCCCGGTCCTCCGCGTCTCCAGTCCTC^ 120 

TCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGA TGAGCGGTGCTCCGACGGCCGGGGC 180 

AGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTA 240 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATC 300 

GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCT^ 360 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTC^GGGAACCGAGGGGTCCACCGACCT 420 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 4 80 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCAC^GG 540 

CCTGGAGAAGCAGGACCTGCGAACTCAGCATCTGCAA^ 600 

CAAGCACCTAGACCATGAGGTGGCaU^CCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 

CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCT 720 

TCAGAGGCGCC^CGATGGCTCAATGGACTTC^CCGGCCCTGGGAAGCCTAC^GGCG^ 780 

GTTTGGGGATCCCCACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCAT 840 

GGACCGCAACAGCOJCCTGGCCGTGCAGCTGCGGGACTGGGAT^ 900 

GCAGTTCTCCGTGCACCTGGGTGGCGAGGACAO^ 960 

CGTGGCCGGCOVGCTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTC 1020 

CACTTGGGACCAGGATCACGACCTCCGCAGGGACAAGAA 1080 

CCCATCGGTGGCTCAAAGACCTGACC^TGTTCCCTCTCCCCTGACCCCGGC^GGAGGCTG 1140 

GTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACA 1200 

GC^GCGGC^GAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCC 1260 
GCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAO 1315 
(SEQ ID NO: 54) 

Protein sequence for CG57051-02. 

>CG57051_02 

MSGAPTAGAALMLCAATAVLLS ARS G PVQS KS PRFAS WDEMNVIiAHGLLQLGQGLREHAE 6 0 

RTRSQLSALERRLSACGSACCX3TEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRI 120 

HKVAQQQRHLE KQHLR I QHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

LHHGGWTVI QRRHDGSMDFNRPWEAYKAGFGDPHGEFWLGLEKVHS I TGDRNSRLAVQLR 240 

DVnX^AELI^FSVHIX^EDTAYSI^LTAPVAGOLGATTVPPSGLSVPFSTV^ 300 

KNCAKSLS APSVAQRPDHVPS PLTPAGGWWFGTCSHSNLNGQYFRS I PQQRQKLKKGI FW 360 
KTWRGRYYPLQATTMLIQPMAAEAAS 386 (SEQ ID NO: 55) 



gure 23. Nucleotide and Protein Sequences For CG57051-03. 

Nucleotide sequence encoding the Angiopoietin-like protein, CG57051-03. 



>CG57051-03 

CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 60 

GGCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCXXrCXSTGCTACT 120 

ACCCXSTGCAGTCCAAGTCGCCGCGCTTTGCGTC^ 180 

CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAAC^CGCGGAGCGCACC 240 

GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGCT 300 

CACCGACCTCCCX5TTAGCCCCTGAGAGCCGGGTGGACCCTGAGGT^ 360 

GAC^CAACTCAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAA 420 

GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTC^GCATCTGCAAAGC 480 

CCnX5ACCACAAG(^CCTAGACCATGAGGTGGCCAAGCCTGCC 540 

CGAGATGGCCCAGCC^GTTGACCCGGCTCACAATGTC^GCCGCCTGCACCA 600 

GACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAACCGGCC 660 

CAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGGTCT 720 

CATCACGGGGGACCGCAACAGCCGCCTGGCCGTGC^GCT 780 

CGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGAC^ 84 0 

CACTGC^CCCGTGGCO^CCAGCTGGGCG^ 900 

ACCCTTCCCCACTTGGGAC(^GGATCACGACCTCCGC^ 960 

CCTCTCTGGAGGCTGGTGGTTTGGCACCTCC^GCCATTC 1020 

CCGCTCCATCCCACAGGAGCGGCAGAAGCTTAAGAA 1080 
GGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGC 114 0 
AGCCTCCTAG 1150 (SEQ ID NO: 56) 



Protein sequence for CG57051-03. 



>CG57051-03 

MSGAPTAGAALMLCAATAVLLS AQGGPVQS KS PRFASVTOEMNVIiAHGLLQIjGQGLREHAE 6 0 

RTRSQI^ALERRI^ACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSR 120 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVD 180 

IJIHGGWTV IQRRHIX3SVDFNRPWEAYKAGFGDPHGEFWIX3LEKVHS I TGDRNSRLAVQLR 24 0 

DWDDNAELLQFSVHLGGEDTAYSLQLTAPVAGQIX^^ 300 

KNCAKSLSGGWWFGTCSHSNLNGQYFRS I PQQRQKLKKGI FWKTWRGRYYPLQATTMLI Q 360 
PMAAEAAS 368 (SEQ ID NO:57) 



